Re: Getting started - sharding data by customer, and hadoop version requirements.

Garrett Barton Fri, 21 Dec 2012 07:59:39 -0800

If I understand you correctly you have data from multiple customers
(denoted by a customer_id) and you only perform a search against a single
customer at a time?  If that's the case the separate index route might be a
good idea as you can rebuild them separately, and you can model them
differently potentially if you have a need.  Having said that, if you also
occasionally want to search across customers, then you would want them all
in a single index.

I have Blur 1.x running on CDH3U5, I think it will work back down to CDH3U2
at least, and that's hadoop 0.20 in both cases.  Have not tried 0.23 though
I will be needing to soon.

On Fri, Dec 21, 2012 at 10:51 AM, James Kebinger <[email protected]>wrote:

> Hello, I'm hoping to kick the tires on apache blur in the near future. I
> have a couple of quick questions before I set out.
>
> What version(s) of hadoop are required/supported at present?
>
> We have lots of data to index, but we always search within a particular
> customer's data set. Would the best practice be to put all of the data in
> one table and have the customer id in all of the queries, or build separate
> tables for each customer_id (like users-1, users-123 etc).
>
> Thanks, and happy holidays!
>
> -James Kebinger
>

Re: Getting started - sharding data by customer, and hadoop version requirements.

Reply via email to