Re: Getting started - sharding data by customer, and hadoop version requirements.

James Kebinger Fri, 21 Dec 2012 08:07:37 -0800

Thanks, that's exactly the use case for us by customer id. We never need to
query across customers, so the separate index route sounds good. Is there a
practical limit to the number of indexes a blur instance can maintain? Many
of them would be pretty small, but we'd have tens of thousands of each.


We're on CDH3 now, but moving up to CDH4 in the new year. Is Blur supported
there yet?




On Fri, Dec 21, 2012 at 10:59 AM, Garrett Barton
<[email protected]>wrote:

> If I understand you correctly you have data from multiple customers
> (denoted by a customer_id) and you only perform a search against a single
> customer at a time?  If that's the case the separate index route might be a
> good idea as you can rebuild them separately, and you can model them
> differently potentially if you have a need.  Having said that, if you also
> occasionally want to search across customers, then you would want them all
> in a single index.
>
> I have Blur 1.x running on CDH3U5, I think it will work back down to CDH3U2
> at least, and that's hadoop 0.20 in both cases.  Have not tried 0.23 though
> I will be needing to soon.
>
>
> On Fri, Dec 21, 2012 at 10:51 AM, James Kebinger <[email protected]
> >wrote:
>
> > Hello, I'm hoping to kick the tires on apache blur in the near future. I
> > have a couple of quick questions before I set out.
> >
> > What version(s) of hadoop are required/supported at present?
> >
> > We have lots of data to index, but we always search within a particular
> > customer's data set. Would the best practice be to put all of the data in
> > one table and have the customer id in all of the queries, or build
> separate
> > tables for each customer_id (like users-1, users-123 etc).
> >
> > Thanks, and happy holidays!
> >
> > -James Kebinger
> >
>

Re: Getting started - sharding data by customer, and hadoop version requirements.

Reply via email to