If I understand you correctly you have data from multiple customers (denoted by a customer_id) and you only perform a search against a single customer at a time? If that's the case the separate index route might be a good idea as you can rebuild them separately, and you can model them differently potentially if you have a need. Having said that, if you also occasionally want to search across customers, then you would want them all in a single index.
I have Blur 1.x running on CDH3U5, I think it will work back down to CDH3U2 at least, and that's hadoop 0.20 in both cases. Have not tried 0.23 though I will be needing to soon. On Fri, Dec 21, 2012 at 10:51 AM, James Kebinger <[email protected]>wrote: > Hello, I'm hoping to kick the tires on apache blur in the near future. I > have a couple of quick questions before I set out. > > What version(s) of hadoop are required/supported at present? > > We have lots of data to index, but we always search within a particular > customer's data set. Would the best practice be to put all of the data in > one table and have the customer id in all of the queries, or build separate > tables for each customer_id (like users-1, users-123 etc). > > Thanks, and happy holidays! > > -James Kebinger >
