Fields, Index segments and docIds (second Try)

Olivier Binda Tue, 29 Apr 2014 15:13:30 -0700

Hello.

Sorry to bring this up again. I don't want to be rudeand I mean nodisrespect, but after thinking it through today,I need to and would really love to have the answer to the followingquestion :

1) At lucene indexing time, is it possible to rewrite a read-only indexso that some fields are only found in some segments (and how ?)

Uwe Schindler suggested using different index and a MultiReader for myneeds and It probably answers my second question, better formulated as"Is it possible to restrict an index to some of it's segments ? " as aCompositeReader with AtomicReaders (or a custom Directory) that read theaforementioned segments might do the trick

Yet, if I am not mistaken (please tell me if I am wrong), it doesn'tsolve my needs as I have around 300000 documents of the following kind :


READ ONLY Document :
// common fields shipped with the App that aren't language related
A:
B:
C:
// fields shipped with the English package (a zip)
EN:
EN_Words:
EN_Sentences:
some DocValues
// fields shipped with the German package (a zip)
DE:
DE_Words:
DE_Sentences:
some DocValues
...
There might be hundreds of language package that my users might use


If I use different indexes
indexA for the common stuff,
indexEN for the English package,
indexDE for the german package,

For sure, I will be able to make a big index out of those by using aMultiReaderBUT it really makes an union out of the three index (right ?) whichmeans I'll have 900000 documentsand the documents in the indexA won't have any relations to thedocuments in indexEN (right ?) except if I give each document an id ineach index and make a join at query time which is a big no no, because Iuse a queryParser and users may enter queries like "A:gah AND(DE:schlaffen OR EN:sleep)"

Or I am mistaken and there is a way to create a document in threedifferent index that stay in relations with the same docId ?



My solution if question 1 is possible :

In contrast, if I am able to build my index so that my READ ONLYDocument are stored in


SEGMENT 1
// common fields shipped with the App that aren't language related
A:
B:
C:

SEGMENT 2
// fields shipped with the English package (a zip)
EN:
EN_Words:
EN_Sentences:
some DocValues

SEGMENT 3
// fields shipped with the German package (a zip)
DE:
DE_Words:
DE_Sentences:
some DocValues

I only need to ship SEGMENT 1 in the App and let users download SEGMENT2 or SEGMENT 3 whether they want english or germanand use a composite reader with atomic readers (right ?) to use myfrankenstein index at query time with a queryparser

Also, In case question 1 is possible. I would really like to know too,if it is possible to remap at build time docIds in a read-only index.

An application of this would be :

At day 1, I shipp my app with 2 languages packages : English and german(documents are uniquely identified by a docId... or by an external id(thanks to a docId<-> external id map)

At day 2, I ship an additional language package (French) because I'mable to build an index with English, German, French with the same exactdocIds for each document that the index shipped at day 1




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Fields, Index segments and docIds (second Try)

Reply via email to