[jira] [Commented] (ACCUMULO-625) consider augmenting session state with "breadcrumbs"

Keith Turner (JIRA) Wed, 09 Jan 2013 06:18:15 -0800

    [ 
https://issues.apache.org/jira/browse/ACCUMULO-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13548529#comment-13548529
 ]


Keith Turner commented on ACCUMULO-625:
---------------------------------------

An it sounds like you are really digging into this issue.  I'll try to provide 
some background to help you understand the existing system.  Feel free to ask 
more questions.

bq. Instead of tearing down the iterator at the end of every batch size is it 
possible to put it in a suspended state so that when the iterator comes back 
up, all session state is restored?

A few things to consider about this.

 * To preserver an iterator, you must presreves its data sources.  Preserving 
the data sources that are no longer needed consumes resources.  
 * Clients may not always be well behaved, so the server will eventually time 
things out and tear down the stack.
 * Machine faults will lead to the iterator stack inevitably being torn down.

So user must still handle these cases of the iterator stack being torn down.

bq. It looks like if I use a scanner and enable Isolation the tear down process 
does not occur. This may be a coincident though. This would work but 
BatchScanner do not have this functionality.

Some background on 
[Isolation|http://svn.apache.org/viewvc/accumulo/tags/1.4.2/docs/isolation.html?revision=1409793&view=co].
  Isolation is only guaranteed for a row.  The scanner can tear down the 
iterator stack after a row boundry is passed.  It currently does this if new 
datasources are available. In this case of machine fault an IsolationException 
is thrown.  For the scanner when you get an isolation exception you can just 
restart after the last complete row.  For the batch scanner its not clear what 
a good recovery strategy would be since the batch scanner reads from many 
machines in parallel and commingles data.  If the batchscanner had an isolation 
exception, you would probably just have to restart the entire batch scan.

bq. Sending result through a WholeRowIterator is does not prevent the tear down 
process.

It partially does.  When there is no isolation, the iterator stack can be torn 
down after it returns any key value.  The WholeRowIterator reads an entire row 
before returning a key value.  Therefore it will not be torn down while reading 
a row.  It can certainly be torn down between rows

bq. For a Scanner one can just call getBatchSize() but again BatchScanners do 
not have this functionality.

A batch scan potentially batches data across tablets.  The iterator stack is 
created for each tablet.  Just something to consider.

                
> consider augmenting session state with "breadcrumbs"
> ----------------------------------------------------
>
>                 Key: ACCUMULO-625
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-625
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: tserver
>            Reporter: Eric Newton
>            Assignee: Keith Turner
>
> Presently, the iterator stack can be created and destroyed at the whim of the 
> tserver and its buffering needs.  In complex iterations, lower-level 
> iterators can make significant progress which is not inherently obvious in 
> any returned key.  When the iterator stack is re-created to continue a query, 
> the last key returned is used to {{seek()}} the iterators.  Lower-level 
> iterators must re-scan their data to move back to the old position.
> Consider a mechanism to save progress beyond the last key returned.
>   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ACCUMULO-625) consider augmenting session state with "breadcrumbs"

Reply via email to