Re: Sharding Techniques

Samarendra Pratap Tue, 10 May 2011 00:43:43 -0700

Hi,
 Though we have 30 GB total index, size of the indexes that are used
in 75%-80% searches is 5 GB. and we have average search time around 700 ms.
(yes, we have optimized index).


Could someone please throw some light on my original doubt!!!
If I want to keep smaller indexes on different servers so that CPU and
memory may be optimized, how can I aggregate the results of a query from
each of the server. One thing I know is RMI which I studied a few years
back, but that was too slow (or i thought so that time). What are other
techniques?

Is 1 second a bad search time for following?
total index size: 30 GB
index size which is being used in 80% searches - 5 GB
number of fields - 40
most of the fields being numeric fields.
one big "contents" field with 500 - 1000 words.
3500 queries / second mostly on
on an average a query uses 7 fields (1 big 6 small) with 25-30 tokens

Are there any benchmarks from which I can compare the performance of my
application? Or any approximate formula which can guide me
calculating (using system parameters and index/search stats) the "best"
expected search time?

Thanks in advance

On Tue, May 10, 2011 at 9:59 AM, Ganesh <emailg...@yahoo.co.in> wrote:

> We are using similar technique as yours. We keep smaller indexes and use
> ParallelMultiSearcher to search across the index. Keeping smaller indexes is
> good as index and index optimzation would be faster.  There will be small
> delay while searching across the indexes.
>
> 1. What is your search time?
> 2. Is your index optimized?
>
> I have a doubt, If we keep the indexes size to 30 GB then each file size
> (fdt, fdx etc) would in GB's. Small addition or deletion to the file will
> not cause more IO as it has to skip those bytes and write it at the end of
> file.
>
> Regards
> Ganesh
>
>
>
> ----- Original Message -----
> From: "Samarendra Pratap" <samarz...@gmail.com>
> To: <java-user@lucene.apache.org>
> Sent: Monday, May 09, 2011 5:26 PM
> Subject: Sharding Techniques
>
>
> > Hi list,
> > We have an index directory of 30 GB which is divided into 3
> subdirectories
> > (idx1, idx2, idx3) which are again divided into 21 sub-subdirectories
> > (idx1-1, idx1-2, ...., idx2-1, ...., idx3-1, ...., idx3-21).
> >
> > We are running with java 1.6, lucene 2.9 (going to upgrade to 3.1 very
> > soon), linux (fedora core - kernel 2.6.17-13.1), reiserfs.
> >
> > We have almost 40 fields in each index (is it a bad to have so many
> > fields?). most of them are id based fields.
> > We are using 8 servers for search, and each of which receives
> approximately
> > 3000/hour queries in peak hour and search time of more than 1 second is
> > considered bad (is it really bad?) as per the business requirement.
> >
> > Since past few months we are experiencing issues (load and search time)
> on
> > our search servers, due to which I am looking for sharding techniques.
> Can
> > someone guide or give me pointers where i can read more and test?
> >
> > Keeping parts of indexes on different servers search on all of them and
> then
> > merging the results - what could be the best approach?
> >
> > Let me tell you that most queries use only 6-7 indexes and 4 - 5 fields
> (to
> > search for) but some queries (searching all the data) require all the
> > indexes and are primary cause of the performance degradation.
> >
> > Any suggestions/ideas are greatly appreciated. And further more will
> > sharding (or similar thing) really reduce search time? (load is a less
> > severe issue when compared to search time)
> >
> >
> > --
> > Regards,
> > Samar
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 
Regards,
Samar

Re: Sharding Techniques

Reply via email to