Hi

Space: 700Mb vs 4.5Gb sounds way too big a difference.  Are you sure
you aren't loading multiple copies of the data or something like that?

Queries: a 20 times slowdown for a multi field query also sounds way
too big.  What do the simple and multi field queries look like?



--
Ian.


On Wed, Jan 21, 2009 at 1:39 PM, Anshul jain <anshul.j...@epfl.ch> wrote:
> Hi,
>
> I've indexed around half a million XML documents. Here is the document
> sample:
>
>  <a:attribute>
>
> <a:name>cogito:Name</a:name>
>
> <a:value>Alexander the Great</a:value>
>
> </a:attribute>
>
>
>
> <a:attribute>
>
> <a:name>cogito:domain</a:name>
>
> <a:value>ancient history</a:value>
>
> </a:attribute>
>
>
>
> <a:attribute>
>
> <a:name>cogito:first_sentence</a:name>
>
> <a:value>
>
> Alexander the Great (Greek: or Megas Alexandros; July 20 356 BC June 10 323
> BC), also known as Alexander III, was an ancient Greek king (basileus) of
> Macedon (336-323 BC).
>
> </a:value>
>
> </a:attribute>
>
>
> Average size of documents is around 4KB.
>
> There are a few performance issues I need help with. When I index documents,
> in a structured manner, using field information like:
> name: alexander the great
> domain: ancient history
> first_sentence: Alexander the Great (Greek: or Megas Alexandros; July 20 356
> BC June 10 323 BC), also known as Alexander III, was an ancient Greek king
> (basileus) of Macedon (336-323 BC).
> bagOfWords: alexander the great ancient history Alexander the Great (Greek:
> or Megas Alexandros; July 20 356 BC June 10 323 BC), also known as Alexander
> III, was an ancient Greek king (basileus) of Macedon (336-323 BC).
>
> bagOfWords is the field with all the text appended to it.
>
> I get the index size of 4.5 GB, but if I just append the text and store in
> one field
> like:
> value: alexander the great ancient history Alexander the Great (Greek: or
> Megas Alexandros; July 20 356 BC June 10 323 BC), also known as Alexander
> III, was an ancient Greek king (basileus) of Macedon (336-323 BC).
>
>  the index size is only 700 MB.. why is this happening?
>
>
>
> Also the query execution time of MultiFieldQueries is very slow, it is 20
> times slower than single field query. Is it normal,  what could be the
> reason for that?
>
> Thanks,
> Cheers,
> Anshul
>
> --
> Anshul Jain
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to