[
https://issues.apache.org/jira/browse/BLUR-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491137#comment-13491137
]
Aaron McCurry commented on BLUR-18:
-----------------------------------
I don't think that the BlurDocLocation needs to be Comparable (implement Raw
Comparators). It's really there to let you know what document you are reading
at that moment. The reasoning: When each InputSplit opens a session on a
given shard server, the act of opening the session creates a temporary snapshot
of the indexes. This snapshot guarantees that the document ids will not change
while the session is open. So the document location is just the shard index
for the given table plus the internal Lucene document id. After the session is
closed, or if another session is created before or after the InputSplit creates
it's session the internal Lucene document ids may have changed. This is due to
near real-time updates and or merges that have taken effect.
In any event, the document location is only valid while the session is open. I
just had a thought, should we just take the 2 integers (shard index + Lucene
internal document id) and make them a single Long value? Then we could just
use long writable and I could simplify the thrift API to represent this as
well. What do you guys think?
> Rework the MapReduce Library to implement Input/OutputFormats
> -------------------------------------------------------------
>
> Key: BLUR-18
> URL: https://issues.apache.org/jira/browse/BLUR-18
> Project: Apache Blur
> Issue Type: Improvement
> Reporter: Aaron McCurry
> Fix For: 0.2.0
>
> Attachments: 0001-BLUR-ID-18-Created-New-Version-of-Files.patch,
> 0001-BLUR-ID-18-New-Writables.patch
>
>
> Currently the only way to implement indexing is to use the BlurReducer. A
> better way to implement this would be to support Hadoop input/outputformats
> in both the new and old api's. This would allow an easier integration with
> other Hadoop projects such as Hive and Pig.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira