First thing I can think of is to not use your userid key directly for the rowid. Instead hash/encrypt or a combination of the two to guarantee more evenly partitioned keys thus making the shards more even.
Second thing that comes to mind is to periodically rebuild the index from scratch and up the number of shards. How many rows are you expecting per user? Avg row width? ~Garrett On Thu, Sep 26, 2013 at 5:25 AM, Ravikumar Govindarajan < [email protected]> wrote: > Rahul & Aaron, > > I will take a look at the alternate approach of having different clusters. > Sounds like a very promising method. > > I have another question related to BlurPartitioner. > > I assume that rowId and it's data resides in only one shard. Is this > correct? In that case, how to handle a single-shard that become too > unwieldy over a period of time? [serving too much data and/or too many > rowIds]. What are your suggestions. > > -- > Ravi > > > On Tue, Sep 24, 2013 at 6:36 AM, Aaron McCurry <[email protected]> wrote: > > > Ravi, > > > > See my comments below. > > > > On Mon, Sep 23, 2013 at 8:48 AM, Ravikumar Govindarajan < > > [email protected]> wrote: > > > > > Rahul, > > > > > > "When a search request is issued to the Blur Controller it searches > > > though all the shard servers in parallel and each shard server searches > > > through all of its shards. > > > Unlike database partitioning, I believe we cannot direct a search > to > > a > > > particular shard. > > > However > > > 1. Upon shard server start, all the shards are warmed up ie > Index > > > Reader's for each shard is loaded into memory > > > 2. Blur uses a block level cache. With sufficient memory > > allocated > > > to the cache, performance will be greatly enhanced" > > > > > > You are spot-on that what I was referring was a DB-Sharding like > > technique > > > and it definitely has some advantages, at least in our set-up. > > > > > > I think, what it translates in Blur, is to create N identical tables > and > > > maintain user-wise mappings in our app. > > > > > > I mean lets say I create a table for every 10k users. I will end up > with > > > 300 tables for 3 million users. What are the problems you foresee with > > that > > > large number of tables? I know for sure some K-V stores prohibit such > an > > > approach. > > > > > > > I think that this is valid approach, however I wouldn't optimize too > soon. > > I think the performance will likely be better with one larger table (or > at > > least fewer) than hundreds of smaller tables. However if you need to > split > > into separate tables there is a feature in Blur that you will likely be > > interested in using. > > > > In Blur the controllers act as a gateway/router for the shard cluster. > > However they can access more than just a single shard cluster. If you > > name each shard cluster a different name (blur-site.properties file) the > > controllers see all of them and make all the tables accessible through > the > > controller cluster. > > > > For example: > > > > Given: > > > > Shard Cluster A (100 servers) 10 tables > > Shard Cluster B (100 servers) 10 tables > > Shard Cluster C (100 servers) 10 tables > > Shard Cluster D (100 servers) 10 tables > > > > The controllers would present all 40 tables to the clients. I have > > normally used this feature to do full replacements of MapReduced indexes > > onto a off cluster while another was presenting the data users were > > accessing. Once the indexing was complete the application was soft > > switched to use the new table. > > > > Just some more ideas to think about. > > > > Aaron > > > > > > > > > > > > -- > > > Ravi > > > > > > > > > On Thu, Sep 19, 2013 at 11:16 PM, rahul challapalli < > > > [email protected]> wrote: > > > > > > > I will attempt to answer some of your questions below. Aaron or > someone > > > > else can correct me if I am wrong > > > > > > > > > > > > On Thu, Sep 19, 2013 at 6:15 AM, Ravikumar Govindarajan < > > > > [email protected]> wrote: > > > > > > > > > Thanks Aaron. I think, it has answered my question. I have a few > more > > > and > > > > > would be great if you can clarify them > > > > > > > > > > 1. Is the number of shards per-table fixed during table-creation or > > we > > > > can > > > > > dynamically allocate shards? > > > > > > > > > > > > > I believe we cannot dynamically allocate shards. The only thing we > > can > > > > dynamically add to an existing table is new columns > > > > > > > > > > > > > > 2. Assuming I have 10k shards with each shard-size=2GB, for a total > > of > > > 20 > > > > > TB table size. I typically use RowId = UserId and there are approx > 3 > > > > > million users, in our system > > > > > > > > > > How do I ensure that when a user issues a query, I should not > > > end-up > > > > > searching all these 10k shards, but rather search only a very small > > set > > > > of > > > > > shards? > > > > > > > > > > > > > When a search request is issued to the Blur Controller it > searches > > > > though all the shard servers in parallel and each shard server > searches > > > > through all of its shards. > > > > Unlike database partitioning, I believe we cannot direct a search > > to > > > a > > > > particular shard. > > > > However > > > > 1. Upon shard server start, all the shards are warmed up ie > > Index > > > > Reader's for each shard is loaded into memory > > > > 2. Blur uses a block level cache. With sufficient memory > > > allocated > > > > to the cache, performance will be greatly enhanced > > > > > > > > > > > > > > > > > > 3. Are there any advantages of running shard-server and data-nodes > > > {HDFS} > > > > > in the same machine? > > > > > > > > > > Someone else can provide a better answer here. > > > > In a typical Hadoop installation Task Trackers and Data Nodes run > > > > alongside each other on the same machine. Since data nodes store the > > > first > > > > block replica on the > > > > same machine, shard servers might see an advantage in terms of > > > network > > > > latency. However I think it is not a good idea to run Blur alongside > > Task > > > > Trackers for > > > > performance reasons > > > > > > > > > > > > > > > > > -- > > > > > Ravi > > > > > > > > > > > > > > > On Thu, Sep 19, 2013 at 2:36 AM, Aaron McCurry <[email protected] > > > > > > wrote: > > > > > > > > > > > I will attempt to answer below: > > > > > > > > > > > > On Wed, Sep 18, 2013 at 1:54 AM, Ravikumar Govindarajan < > > > > > > [email protected]> wrote: > > > > > > > > > > > > > Thanks a bunch for a concise and quick reply. Few more > questions > > > > > > > > > > > > > > 1. Any pointers/links on how you plan to tackle the > availability > > > > > problem? > > > > > > > > > > > > > > Lets say we store-forward hints to the failed shard-server. > Won't > > > the > > > > > > HDFS > > > > > > > index-files differ in shard replicas? > > > > > > > > > > > > > > > > > > > I am in the process of documenting the strategy and will be > adding > > it > > > > to > > > > > > JIRA soon. The way I am planning on solving this problem doesn't > > > > involve > > > > > > storing the indexes in more than once in HDFS (which of course is > > > > > > replicated). > > > > > > > > > > > > > > > > > > > > > > > > > > 2. I did not phrase my question on cross-join correctly. Let me > > > > clarify > > > > > > > > > > > > > > RowKey = 123 > > > > > > > > > > > > > > RecId = 1000 > > > > > > > Family = "ACCOUNTS" > > > > > > > Col-Name = "NAME" > > > > > > > Col-Value = "ABC" > > > > > > > ...... > > > > > > > > > > > > > > RecId = 1001 > > > > > > > Family = "CONTACTS" > > > > > > > Col-Name = "NAME" > > > > > > > Col-Value = "XYZ" > > > > > > > Col-Name = "ACCOUNTS-NAME" [FK to RecId=1000] > > > > > > > Col-Value = "1000" > > > > > > > ....... > > > > > > > > > > > > > > Lets say the user specifies the search query as > > > > > > > key=123 AND name:(ABC OR XYZ) > > > > > > > > > > > > > > Initially I will apply this query to each of the Family types, > > > namely > > > > > > > "ACCOUNTS", "CONTACTS" etc.... and get their RecIds.. > > > > > > > > > > > > > > After this, I will have to filter "CONTACTS" family results, > > based > > > on > > > > > > > RecIds received from "ACCOUNTS" [Join within records of > different > > > > > family, > > > > > > > based on FK] > > > > > > > > > > > > > > Is something like this achievable? Can I design it differently > to > > > > > satisfy > > > > > > > my requirements? > > > > > > > > > > > > > > > > > > > I may not fully understand your scenario. > > > > > > > > > > > > As I understand your example above: > > > > > > > > > > > > Row { > > > > > > "id" => "123", > > > > > > "records" => [ > > > > > > Record { > > > > > > "recordId" => "1000", "family" => "accounts", > > > > > > "columns" => [Column {"name" => "abc"}] > > > > > > }, > > > > > > Record { > > > > > > "recordId" => "1001", "family" => "contacts", > > > > > > "columns" => [Column {"name" => "abc"}] > > > > > > } > > > > > > ] > > > > > > } > > > > > > > > > > > > Let me go through some example queries that we support: > > > > > > > > > > > > +<accounts.name:abc> +<contacts.name:abc> > > > > > > > > > > > > Another way of writing it would be: > > > > > > > > > > > > <accounts.name:abc> AND <contacts.name:abc> > > > > > > > > > > > > Would yield a hit on the Row, there aren't any FKs in Blur. > > > > > > > > > > > > However if there are some interesting queries that be done with > > more > > > > > > examples: > > > > > > > > > > > > Row { > > > > > > "id" => "123", > > > > > > "records" => [ > > > > > > Record { > > > > > > "recordId" => "1000", "family" => "accounts", > > > > > > "columns" => [Column {"name" => "abc"}] > > > > > > }, > > > > > > Record { > > > > > > "recordId" => "1001", "family" => "contacts", > > > > > > "columns" => [Column {"name" => "abc"}] > > > > > > } > > > > > > ] > > > > > > } > > > > > > > > > > > > Row { > > > > > > "id" => "456", > > > > > > "records" => [ > > > > > > Record { > > > > > > "recordId" => "1000", "family" => "accounts", > > > > > > "columns" => [Column {"name" => "abc"}] > > > > > > }, > > > > > > Record { > > > > > > "recordId" => "1001", "family" => "contacts", > > > > > > "columns" => [Column {"name" => "abc"}] > > > > > > }, > > > > > > Record { > > > > > > "recordId" => "1002", "family" => "contacts", > > > > > > "columns" => [Column {"name" => "def"}] > > > > > > } > > > > > > ] > > > > > > } > > > > > > > > > > > > > > > > > > Row { > > > > > > "id" => "789", > > > > > > "records" => [ > > > > > > Record { > > > > > > "recordId" => "1000", "family" => "accounts", > > > > > > "columns" => [Column {"name" => "abc"}] > > > > > > }, > > > > > > Record { > > > > > > "recordId" => "1002", "family" => "contacts", > > > > > > "columns" => [Column {"name" => "def"}] > > > > > > } > > > > > > ] > > > > > > } > > > > > > > > > > > > For the given query: "<accounts.name:abc> AND <contacts.name: > abc>" > > > > would > > > > > > yield 2 Row hits. 123 and 456 > > > > > > For the given query: "<accounts.name:abc> AND <contacts.name: > def>" > > > > would > > > > > > yield 2 Row hits. 456 and 789 > > > > > > For the given query: "<contacts.name:abc> AND <contacts.name: > def>" > > > > would > > > > > > yield 1 Row hit of 456. NOTICE that the family is the same here > > > > > > "contacts". > > > > > > > > > > > > Also in Blur you can turn off the Row query and just query the > > > records. > > > > > > This would be your typical Document like access. > > > > > > > > > > > > I fear that this has not answered your question, so if it hasn't > > > please > > > > > let > > > > > > me know. > > > > > > > > > > > > Thanks! > > > > > > > > > > > > Aaron > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Ravi > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Sep 17, 2013 at 7:01 PM, Aaron McCurry < > > [email protected] > > > > > > > > > > wrote: > > > > > > > > > > > > > > > First off let me say welcome! Hopefully I can answer your > > > > questions > > > > > > > inline > > > > > > > > below. > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Sep 17, 2013 at 6:52 AM, Ravikumar Govindarajan < > > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > > > I am quite new to Blur and need some help with the > following > > > > > > questions > > > > > > > > > > > > > > > > > > 1. Lets say I have a replication_factor=3 for all HDFS > > indexes. > > > > In > > > > > > case > > > > > > > > one > > > > > > > > > of the server hosting HDFS indexes goes down [temporary or > > > > > > take-down], > > > > > > > > what > > > > > > > > > will happen to writes? Some kind-of HintedHandoff [as in > > > > Cassandra] > > > > > > is > > > > > > > > > supported? > > > > > > > > > > > > > > > > > > > > > > > > > When there is a Blur Shard Server failure state in ZooKeeper > > will > > > > > > change > > > > > > > > and the other shard servers will take action to bring the > down > > > > > shard(s) > > > > > > > > online. This is similar to the HBase region model. While > the > > > > > shard(s) > > > > > > > are > > > > > > > > being relocated (which really means being reopened from HDFS) > > > > writes > > > > > to > > > > > > > the > > > > > > > > shard(s) being moved are not available. However the bulk > load > > > > > > capability > > > > > > > > is always available as long as HDFS is available, this can be > > > used > > > > > > > through > > > > > > > > Hadoop MapReduce. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To re-phrase, what is the Consistency Vs Availability > > trade-off > > > > in > > > > > > > Blur, > > > > > > > > > with replication_factor>1 for HDFS indexes? > > > > > > > > > > > > > > > > > > > > > > > > > Of the two Consistency is favored over Availability, however > we > > > are > > > > > > > > starting development (in 0.3.0) to increase availability > during > > > > > > failures. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. Since HDFSInputStream is used underneath, will this > result > > > in > > > > > too > > > > > > > much > > > > > > > > > of data-transfer back-and-forth? A case of > > multi-segment-merge > > > or > > > > > > even > > > > > > > > > wild-card search could trigger it. > > > > > > > > > > > > > > > > > > > > > > > > > Blur uses an in process file system cache (Block Cache is the > > > term > > > > > used > > > > > > > in > > > > > > > > the code) to reduce the IO from HDFS. During index merges > data > > > > that > > > > > is > > > > > > > not > > > > > > > > in the Block Cache is read from HDFS and the output is > written > > > back > > > > > to > > > > > > > > HDFS. Overall once an index is hot (been online for some > time) > > > the > > > > > IO > > > > > > > for > > > > > > > > any given search is fairly small assuming that the cluster > has > > > > enough > > > > > > > > memory configured in the Block Cache. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3. Does Blur also support foreign-key like semantics to > > search > > > > > across > > > > > > > > > column-families as well as delete using row_id? > > > > > > > > > > > > > > > > > > > > > > > > > Blur supports something called Row Queries that allow for > > > searches > > > > > > across > > > > > > > > column families within single Rows. Take a look at this page > > > for a > > > > > > > better > > > > > > > > explanation: > > > > > > > > > > > > > > > > > > > > http://incubator.apache.org/blur/docs/0.2.0/data-model.html#querying > > > > > > > > > > > > > > > > And yes Blur supports deletes by Row check out: > > > > > > > > > > > > > > > > > > > > http://incubator.apache.org/blur/docs/0.2.0/Blur.html#Fn_Blur_mutate > > > > > > > > and > > > > > > > > > > > > > > > > > > > > http://incubator.apache.org/blur/docs/0.2.0/Blur.html#Struct_RowMutation > > > > > > > > > > > > > > > > Hopefully this can answer so of your questions. Let us know > if > > > you > > > > > > have > > > > > > > > any more. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Aaron > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Ravi > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
