Tim Williams created BLUR-344:
---------------------------------

             Summary: Expose a Scanner capability that allows various 
implementations (e.g. ExportScanner)
                 Key: BLUR-344
                 URL: https://issues.apache.org/jira/browse/BLUR-344
             Project: Apache Blur
          Issue Type: New Feature
          Components: Blur Console
            Reporter: Tim Williams
            Assignee: Tim Williams


Blur should have the ability to have "scanner" plugins that, given a query, are 
handed all the matching records of the query.  These would be async long 
running calls from the thrift api perspective.  

The scanner would essentially be given a collector of the hits with the fields 
defined by the passed in selector.

The client would ask for a scan, then poll for the status periodically and - 
depending on the Scanner implementation - pick up the results in whatever form 
they were requested.

For a concrete implementation, think of export.  The ExportScanner would be 
given a location in HDFS and scan over all the results and drop them in that 
directory - maybe in a particular requested form.  The Scanner pattern could be 
have many useful implementations though - for example, to insert a subset of 
the data into a new Blur Table.

Here are some client API thoughts:
{code}
struct ScannerQuery {
  1:Query query,
  2:Selector selector,
  3:string id,
  4:string userContext,
  5:string scannerName,
  6:i64 startTime = 0,
  7:map<string,string> properties
}

enum ScanStatus {
  COMPLETE,
  RUNNING,
  ERROR
 }

  void scan(
    1:ScannerQuery scannerQuery
  ) throws (1:BlurException ex)

  list<string> scanList(
  ) throws (1:BlurException ex)

  ScanStatus statusScan(
    1:string scanId
  ) throws (1:BlurException ex)

  void cancelScan(
    1:string scanId
 ) throws (1:BlurException ex)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to