Tim Williams created BLUR-344:
---------------------------------
Summary: Expose a Scanner capability that allows various
implementations (e.g. ExportScanner)
Key: BLUR-344
URL: https://issues.apache.org/jira/browse/BLUR-344
Project: Apache Blur
Issue Type: New Feature
Components: Blur Console
Reporter: Tim Williams
Assignee: Tim Williams
Blur should have the ability to have "scanner" plugins that, given a query, are
handed all the matching records of the query. These would be async long
running calls from the thrift api perspective.
The scanner would essentially be given a collector of the hits with the fields
defined by the passed in selector.
The client would ask for a scan, then poll for the status periodically and -
depending on the Scanner implementation - pick up the results in whatever form
they were requested.
For a concrete implementation, think of export. The ExportScanner would be
given a location in HDFS and scan over all the results and drop them in that
directory - maybe in a particular requested form. The Scanner pattern could be
have many useful implementations though - for example, to insert a subset of
the data into a new Blur Table.
Here are some client API thoughts:
{code}
struct ScannerQuery {
1:Query query,
2:Selector selector,
3:string id,
4:string userContext,
5:string scannerName,
6:i64 startTime = 0,
7:map<string,string> properties
}
enum ScanStatus {
COMPLETE,
RUNNING,
ERROR
}
void scan(
1:ScannerQuery scannerQuery
) throws (1:BlurException ex)
list<string> scanList(
) throws (1:BlurException ex)
ScanStatus statusScan(
1:string scanId
) throws (1:BlurException ex)
void cancelScan(
1:string scanId
) throws (1:BlurException ex)
{code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)