Re: Getting a proper ID value into every document

Erick Erickson Fri, 05 Jun 2015 07:44:31 -0700

My first recommendation, of course, would be to re-index the corpus
with a new field. If possible, frankly, that would probably be less
effort than trying to hack in an ID after the fact as well as not as
error-prone.


If you cannot do this for whatever reason, I vaguely remember someone
posting a link to a program they'd put together to do this for a
docValues field, you'd have to search the archives to find it.

bq: Since Lucene is now (more or less?) using a separate index per field

Not quite sure what you mean here, docValues?

Not a lot of help, but at least an idea.
Erick

On Thu, Jun 4, 2015 at 10:20 PM, Trejkaz <trej...@trypticon.org> wrote:
> Hi all.
>
> We had been going for the longest time abusing Lucene's doc IDs as our
> own IDs and of course all our filters still work like this. But at the
> moment, we're looking at ways to break our dependencies on this.
>
> One of the motivators for this is the outright removal of FieldCache
> in Lucene 5. (Yes, I see it's there, being used by UninvertingReader,
> but even UninvertingReader won't let us use the custom parsers we were
> previously relying on to parse values in older fields.)
>
> I know we now have this:
>
>     public void updateNumericDocValue(Term term, String field, long value)
>         throws IOException
>
> So to add an ID field, I would have to already have an ID field. :(
>
> Since Lucene is now (more or less?) using a separate index per field,
> maybe there is a way to directly add this field?
>
> I don't know if this is down the right path, but it seems like it
> would be something like...
>
> try (Directory directory = FSDirectory.open(Paths.get("/Data/BreakMe")))
> {
>     SegmentInfo segmentInfo = ???;
>     FieldInfos fieldInfos = ???;
>
>     SegmentWriteState writeState = new SegmentWriteState(null, directory,
>                                                          segmentInfo,
> fieldInfos,
>                                                          null,
> IOContext.DEFAULT);
>
>     try (DocValuesConsumer consumer =
>              new Lucene50DocValuesFormat().fieldsConsumer(writeState))
>     {
>         int number = ???;
>         long dvGen = ???;
>         Map<String, String> attributes = ???;
>
>         FieldInfo field = new FieldInfo("docid", number, false, true, false,
>                                         IndexOptions.DOCS,
> DocValuesType.NUMERIC,
>                                         dvGen, attributes);
>         Iterable<Number> values = () -> IntStream.range(0, 500)
>                                                  .mapToObj(i -> (Number) i)
>                                                  .iterator();
>         consumer.addNumericField(field, values);
>     }
> }
>
> But there are a *lot* of values that I wouldn't have any idea how to get.
>
> TX
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Getting a proper ID value into every document

Reply via email to