Re: ArrayInputStream and performance

Daniel John Debrunner Wed, 29 Nov 2006 08:18:13 -0800

[EMAIL PROTECTED] wrote:

Mike Matrigali <[EMAIL PROTECTED]> writes:

It would be fine to use an unchecked and/or an ASSERT based check for
 readFieldLengthAndSetStreamPosition.  The "store" module owns this
access and is not counting on limit checks to catch anything here.


Another question:

All observed calls to setLimit (single tuple select load) come from the same

method: StoredPage.readRecordFromArray(...)(setLimit is also called from readRecordFromStream() but this does not

seem to happen with this type of load)

And the argument to setLimit()is always the local variable fieldDataLength
which is the return value from
StoredFieldHeader.readLengthAndSetStreamPosition().

So if readLengthAndSetStreamPosition() can update the position without
checking, presumably the return value from this method can be trusted
as well? Is it then necessary to check this value again in setLimit(),
or could we have used an unchecked version of setLimit here?

I'm worried by this approach of removing checking of the limit or theposition, it's much like saying we don't needs bounds checking on arraysbecause I know my code is correct.

The current code provides some protection from a software bug, corruptedpage or hacked page. Removing those checks may lead to hard to detectbugs where a position and/or limit is calculated incorrectly andsubsequently leads to corrupted data or random exceptions being thrown.

My feeling is that the integrity of the store and the code is betterserved by keeping these checks.

I also think we need more performance numbers to justify such a change,a single set of runs from a single vm does not justify it. I will runnumbers on linux with a number of vm's when I get the chance.

Also often in these cases it is better to try and optimize at a higherlevel, rather than try to optimize at the lowest level (especially whenremoving such checks). In this case see if the number of calls tosetLimit() or setPosition() could be reduced rather thanmicro-optimizing these methods by changing their core functionality.

As an example the setLimit() call around the readExternalFromArraymethod. Maybe this responsibility could be pushed into the data typeitself, and for some builtin types we trust their read mechanism to readthe correct amount of data. E.g. reading a INTEGER will always read fourbytes, so no need to set a limit around it. The limit is pushed there tosupport types that do not know their length on disk (e.g. some charactertypes, some binary types, user defined types) and was to supportarbitrary user types when the engine cannot trust or require that thede-serialization will read the complete stream and only its data.


Dan.

Re: ArrayInputStream and performance

Reply via email to