I think you can still achieve your desired outcome, but I'm not sure I fully
understand the use case.  Can you describe more clearly a specific example
of what you need to achieve?

You are correct that "joins" in lucene aren't really a strong point, but
this is often a by-product of thinking about Lucene in the same wy we think
about a database; which is a mistake (in my opinion).

I often have trouble trying to use Lucene in the same way I use a database
(joined queries etc).  Usually I need to re-think the way I am using Lucene
to avoid needing to do this.

On 8/19/06, Shaghayegh Sahebie <[EMAIL PROTECTED]> wrote:

thanks Jason and Steve;
maybe i didn't understand your solution well, but in this system a
document is refered many times (we have a refer description wich we should
index it also) and each time a document is refered i should update it in the
lucene index and so i should delete it and then index it again. and it means
deleting and indexing this document many times. i think indexing is time
consuming.
and as another question can lucene have different value for one of it's
fields? 'cause i refer a doc many time and i need many refer fields(but i
don't know how many) in one documents fields. and i can not e.g. concat
all the refers in one feld 'cause i may have other constraints on a refer
and doument also (e.g. give me the documents which they have word "foo" in
their refers and the refer of them which has this wordis on "yyyy/mm/dd"
date. if i concat the refers i may find a refer which has the word in it but
another refer of this document is on the given date. i mean i should know
which refer is on which date and also other fields of a refer other than
date. the same is true for attachments) on the date of a refer an

i think the main problem i got is that lucene can not handle joins and i
think i need joins.

regards
--Shaghayegh

Steven Rowe <[EMAIL PROTECTED]> wrote: As Jason says, you can structure each
Lucene document with one Field per
content type, and index all data that way.  The database is not required.

To address your search complexity concern, you can create queries that
search only those Field(s) the user wants -- there is no need to have a
Field for each possible combination of content type.

Steve

Jason Polites wrote:
> Maybe I'm not understanding your requirement, but this should be fairly
> simple in Lucene.
>
> Each document in your document management system would be represented by
a
> single Lucene document in the index.  Each lucene document will then
have
> several fields, each field representing the values of the "meta data"
> associated with your document in the document management system.
>
> For example:
>
> Lets say you have a document which has the following structure:
>
> Title: Sample Document
> Date: 01/01/2006
> Attachment: Some attachment (this is the content of the attachment)
> Attachment: Some other attachment
> Refers: Mr X
> Refers: Mr Y
>
> Your Lucene document would have the same structure.  All of these items
in
> your "real" document would simply be Fields in the lucene document.
>
> In the case of your attachments, you could also consider indexing them
> separately (as well).  This way users could search for attachments
without
> needing the documents to which they are attached.
>
> If your attachments ARE documents (that is, your just have a foreign key
> style relationship between two documents), then you would simple index
each
> "real" document as a separate document in lucene and add some sort of
> reference field which contains the ID of all related documents.
>
> For Example:
> -------------------
> ID: 123
> Title: Sample Document
> Date: 01/01/2006
> Attachment: 456
> Attachment: 789
> Refers: Mr X
> Refers: Mr Y
> --------------------
> ID: 456
> Title: Other Document
> Date:.. etc etc...
>
> The one thing to be mindful of is re-indexing existing documents.  If
you
> have document that is already indexed and you want to make a change (eg
you
> want to add a new "refers" value), then you need to re-index the entire
> document.  This means you need to either "store" all fields you want to
> keep
> during re-indexing (which is typically all of them), or you need to
> re-index
> the document from its source. Storing all the data in the index can have
> adverse effects on the performance of the index however. (hope this
makes
> sense).
>
>
>
> On 8/12/06, Shaghayegh Sahebie  wrote:
>>
>> Hi all;
>> We have got a Document management system and we want to build a search
on
>> it. We have tree kind of content in our system: Refers, Documents and
>> Attachments. A document can have multiple attachments and can be
>> Refered to
>> many users.
>> Our users want to be able to search on documents attachments and
refers.
>> for example they want to search the Documents which are created at
>> "2006/07/06" date and have the word "Lucene" in it or their Refers and
>> are
>> Refered to Mr.x.
>> Our users want to be ale to search in all 8 possible selections of
>> Document, Refer and Attachment, I mean they want to be able to search
>> just
>> in Refers, in both Refers and Documents, ...
>> How can we handle it?
>> I thaught to store diferent kinds of Docs in a DB, search in the DB at
>> first and search in Lucene based on DB results and phrases given to
>> search
>> (Handling Document, Refer or Attachments parts in a DB search). But
>> the DB
>> results maybe so big and i don't know if a Lucene query can have these
>> much
>> of search Terms.
>> Another way is to Index each document, refer and attachment in the
>> index 8
>> times(all the possible selections of Refer, Document and Attachment)
but
>> this way has lots of redundancy even more than 8 times! 'cause each
>> Document
>> is indexed "8 * Refer number of Document" times.
>> I really don't know what to do, Any suggestions Please?
>>
>> Thanks in advance


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------
Do you Yahoo!?
Everyone is raving about the  all-new Yahoo! Mail Beta.

Reply via email to