[
https://issues.apache.org/jira/browse/ACCUMULO-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13548529#comment-13548529
]
Keith Turner commented on ACCUMULO-625:
---------------------------------------
An it sounds like you are really digging into this issue. I'll try to provide
some background to help you understand the existing system. Feel free to ask
more questions.
bq. Instead of tearing down the iterator at the end of every batch size is it
possible to put it in a suspended state so that when the iterator comes back
up, all session state is restored?
A few things to consider about this.
* To preserver an iterator, you must presreves its data sources. Preserving
the data sources that are no longer needed consumes resources.
* Clients may not always be well behaved, so the server will eventually time
things out and tear down the stack.
* Machine faults will lead to the iterator stack inevitably being torn down.
So user must still handle these cases of the iterator stack being torn down.
bq. It looks like if I use a scanner and enable Isolation the tear down process
does not occur. This may be a coincident though. This would work but
BatchScanner do not have this functionality.
Some background on
[Isolation|http://svn.apache.org/viewvc/accumulo/tags/1.4.2/docs/isolation.html?revision=1409793&view=co].
Isolation is only guaranteed for a row. The scanner can tear down the
iterator stack after a row boundry is passed. It currently does this if new
datasources are available. In this case of machine fault an IsolationException
is thrown. For the scanner when you get an isolation exception you can just
restart after the last complete row. For the batch scanner its not clear what
a good recovery strategy would be since the batch scanner reads from many
machines in parallel and commingles data. If the batchscanner had an isolation
exception, you would probably just have to restart the entire batch scan.
bq. Sending result through a WholeRowIterator is does not prevent the tear down
process.
It partially does. When there is no isolation, the iterator stack can be torn
down after it returns any key value. The WholeRowIterator reads an entire row
before returning a key value. Therefore it will not be torn down while reading
a row. It can certainly be torn down between rows
bq. For a Scanner one can just call getBatchSize() but again BatchScanners do
not have this functionality.
A batch scan potentially batches data across tablets. The iterator stack is
created for each tablet. Just something to consider.
> consider augmenting session state with "breadcrumbs"
> ----------------------------------------------------
>
> Key: ACCUMULO-625
> URL: https://issues.apache.org/jira/browse/ACCUMULO-625
> Project: Accumulo
> Issue Type: Improvement
> Components: tserver
> Reporter: Eric Newton
> Assignee: Keith Turner
>
> Presently, the iterator stack can be created and destroyed at the whim of the
> tserver and its buffering needs. In complex iterations, lower-level
> iterators can make significant progress which is not inherently obvious in
> any returned key. When the iterator stack is re-created to continue a query,
> the last key returned is used to {{seek()}} the iterators. Lower-level
> iterators must re-scan their data to move back to the old position.
> Consider a mechanism to save progress beyond the last key returned.
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira