Ravi, You can definitely walk through the row ids in blur and detect how many records are within the given row. But there is no built-in way to detect the size of a given row including the size of a record without fetching the row for inspection. This sounds like it could be a useful feature to detect row hotspots in the shards.
The current scheme for distributing data is through the BlurPartitioner and it is a basic hash of the rowid. So that should provide a fairly even distribution of rows but with no regard to the row's size or the size of the shard. In 0.3.0 we will be beginning to expose internal APIs that should allow for more control of the shard layout in the cluster (meaning what machine it is beginning served from). We also have a task to deal with large rows in a better way. https://issues.apache.org/jira/browse/BLUR-220 Aaron On Thu, Sep 26, 2013 at 9:30 PM, Ravikumar Govindarajan < [email protected]> wrote: > Hi, > > The statistics are as follows > > 1. Few hundreds of users - 1-10 GB of indexes > 2. Few thousands of users - 100 MB - 1 GB of indexes > 3. All others - < 100 MB of indexes > > The system has very few heavy-weight users, a little more middle-weight and > then lots of light-weight users. > > Yes, like you had said I might need to split my row-keys internally for > handling large data per-user. > > The problem with such an approach is that IDF will get distributed for a > rowId, impacting either scores or response times. > > Alternatively, I was looking at periodically running schedules to detect > the size of a row for a given user and then isolate the heavy-weights to a > separate shard. > > Is such an operation supported in Blur? > > -- > Ravi > > > On Thu, Sep 26, 2013 at 5:56 PM, Garrett Barton <[email protected] > >wrote: > > > First thing I can think of is to not use your userid key directly for the > > rowid. Instead hash/encrypt or a combination of the two to guarantee more > > evenly partitioned keys thus making the shards more even. > > > > Second thing that comes to mind is to periodically rebuild the index from > > scratch and up the number of shards. > > > > How many rows are you expecting per user? Avg row width? > > > > ~Garrett > > > > > > > > > > On Thu, Sep 26, 2013 at 5:25 AM, Ravikumar Govindarajan < > > [email protected]> wrote: > > > > > Rahul & Aaron, > > > > > > I will take a look at the alternate approach of having different > > clusters. > > > Sounds like a very promising method. > > > > > > I have another question related to BlurPartitioner. > > > > > > I assume that rowId and it's data resides in only one shard. Is this > > > correct? In that case, how to handle a single-shard that become too > > > unwieldy over a period of time? [serving too much data and/or too many > > > rowIds]. What are your suggestions. > > > > > > -- > > > Ravi > > > > > > > > > On Tue, Sep 24, 2013 at 6:36 AM, Aaron McCurry <[email protected]> > > wrote: > > > > > > > Ravi, > > > > > > > > See my comments below. > > > > > > > > On Mon, Sep 23, 2013 at 8:48 AM, Ravikumar Govindarajan < > > > > [email protected]> wrote: > > > > > > > > > Rahul, > > > > > > > > > > "When a search request is issued to the Blur Controller it > > searches > > > > > though all the shard servers in parallel and each shard server > > searches > > > > > through all of its shards. > > > > > Unlike database partitioning, I believe we cannot direct a > search > > > to > > > > a > > > > > particular shard. > > > > > However > > > > > 1. Upon shard server start, all the shards are warmed up ie > > > Index > > > > > Reader's for each shard is loaded into memory > > > > > 2. Blur uses a block level cache. With sufficient memory > > > > allocated > > > > > to the cache, performance will be greatly enhanced" > > > > > > > > > > You are spot-on that what I was referring was a DB-Sharding like > > > > technique > > > > > and it definitely has some advantages, at least in our set-up. > > > > > > > > > > I think, what it translates in Blur, is to create N identical > tables > > > and > > > > > maintain user-wise mappings in our app. > > > > > > > > > > I mean lets say I create a table for every 10k users. I will end up > > > with > > > > > 300 tables for 3 million users. What are the problems you foresee > > with > > > > that > > > > > large number of tables? I know for sure some K-V stores prohibit > such > > > an > > > > > approach. > > > > > > > > > > > > > I think that this is valid approach, however I wouldn't optimize too > > > soon. > > > > I think the performance will likely be better with one larger table > > (or > > > at > > > > least fewer) than hundreds of smaller tables. However if you need to > > > split > > > > into separate tables there is a feature in Blur that you will likely > be > > > > interested in using. > > > > > > > > In Blur the controllers act as a gateway/router for the shard > cluster. > > > > However they can access more than just a single shard cluster. If > you > > > > name each shard cluster a different name (blur-site.properties file) > > the > > > > controllers see all of them and make all the tables accessible > through > > > the > > > > controller cluster. > > > > > > > > For example: > > > > > > > > Given: > > > > > > > > Shard Cluster A (100 servers) 10 tables > > > > Shard Cluster B (100 servers) 10 tables > > > > Shard Cluster C (100 servers) 10 tables > > > > Shard Cluster D (100 servers) 10 tables > > > > > > > > The controllers would present all 40 tables to the clients. I have > > > > normally used this feature to do full replacements of MapReduced > > indexes > > > > onto a off cluster while another was presenting the data users were > > > > accessing. Once the indexing was complete the application was soft > > > > switched to use the new table. > > > > > > > > Just some more ideas to think about. > > > > > > > > Aaron > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Ravi > > > > > > > > > > > > > > > On Thu, Sep 19, 2013 at 11:16 PM, rahul challapalli < > > > > > [email protected]> wrote: > > > > > > > > > > > I will attempt to answer some of your questions below. Aaron or > > > someone > > > > > > else can correct me if I am wrong > > > > > > > > > > > > > > > > > > On Thu, Sep 19, 2013 at 6:15 AM, Ravikumar Govindarajan < > > > > > > [email protected]> wrote: > > > > > > > > > > > > > Thanks Aaron. I think, it has answered my question. I have a > few > > > more > > > > > and > > > > > > > would be great if you can clarify them > > > > > > > > > > > > > > 1. Is the number of shards per-table fixed during > table-creation > > or > > > > we > > > > > > can > > > > > > > dynamically allocate shards? > > > > > > > > > > > > > > > > > > > I believe we cannot dynamically allocate shards. The only > thing > > we > > > > can > > > > > > dynamically add to an existing table is new columns > > > > > > > > > > > > > > > > > > > > 2. Assuming I have 10k shards with each shard-size=2GB, for a > > total > > > > of > > > > > 20 > > > > > > > TB table size. I typically use RowId = UserId and there are > > approx > > > 3 > > > > > > > million users, in our system > > > > > > > > > > > > > > How do I ensure that when a user issues a query, I should > not > > > > > end-up > > > > > > > searching all these 10k shards, but rather search only a very > > small > > > > set > > > > > > of > > > > > > > shards? > > > > > > > > > > > > > > > > > > > When a search request is issued to the Blur Controller it > > > searches > > > > > > though all the shard servers in parallel and each shard server > > > searches > > > > > > through all of its shards. > > > > > > Unlike database partitioning, I believe we cannot direct a > > search > > > > to > > > > > a > > > > > > particular shard. > > > > > > However > > > > > > 1. Upon shard server start, all the shards are warmed up > ie > > > > Index > > > > > > Reader's for each shard is loaded into memory > > > > > > 2. Blur uses a block level cache. With sufficient memory > > > > > allocated > > > > > > to the cache, performance will be greatly enhanced > > > > > > > > > > > > > > > > > > > > > > > > > > 3. Are there any advantages of running shard-server and > > data-nodes > > > > > {HDFS} > > > > > > > in the same machine? > > > > > > > > > > > > > > Someone else can provide a better answer here. > > > > > > In a typical Hadoop installation Task Trackers and Data Nodes > > run > > > > > > alongside each other on the same machine. Since data nodes store > > the > > > > > first > > > > > > block replica on the > > > > > > same machine, shard servers might see an advantage in terms > of > > > > > network > > > > > > latency. However I think it is not a good idea to run Blur > > alongside > > > > Task > > > > > > Trackers for > > > > > > performance reasons > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Ravi > > > > > > > > > > > > > > > > > > > > > On Thu, Sep 19, 2013 at 2:36 AM, Aaron McCurry < > > [email protected] > > > > > > > > > > wrote: > > > > > > > > > > > > > > > I will attempt to answer below: > > > > > > > > > > > > > > > > On Wed, Sep 18, 2013 at 1:54 AM, Ravikumar Govindarajan < > > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > > > Thanks a bunch for a concise and quick reply. Few more > > > questions > > > > > > > > > > > > > > > > > > 1. Any pointers/links on how you plan to tackle the > > > availability > > > > > > > problem? > > > > > > > > > > > > > > > > > > Lets say we store-forward hints to the failed shard-server. > > > Won't > > > > > the > > > > > > > > HDFS > > > > > > > > > index-files differ in shard replicas? > > > > > > > > > > > > > > > > > > > > > > > > > I am in the process of documenting the strategy and will be > > > adding > > > > it > > > > > > to > > > > > > > > JIRA soon. The way I am planning on solving this problem > > doesn't > > > > > > involve > > > > > > > > storing the indexes in more than once in HDFS (which of > course > > is > > > > > > > > replicated). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. I did not phrase my question on cross-join correctly. > Let > > me > > > > > > clarify > > > > > > > > > > > > > > > > > > RowKey = 123 > > > > > > > > > > > > > > > > > > RecId = 1000 > > > > > > > > > Family = "ACCOUNTS" > > > > > > > > > Col-Name = "NAME" > > > > > > > > > Col-Value = "ABC" > > > > > > > > > ...... > > > > > > > > > > > > > > > > > > RecId = 1001 > > > > > > > > > Family = "CONTACTS" > > > > > > > > > Col-Name = "NAME" > > > > > > > > > Col-Value = "XYZ" > > > > > > > > > Col-Name = "ACCOUNTS-NAME" [FK to RecId=1000] > > > > > > > > > Col-Value = "1000" > > > > > > > > > ....... > > > > > > > > > > > > > > > > > > Lets say the user specifies the search query as > > > > > > > > > key=123 AND name:(ABC OR XYZ) > > > > > > > > > > > > > > > > > > Initially I will apply this query to each of the Family > > types, > > > > > namely > > > > > > > > > "ACCOUNTS", "CONTACTS" etc.... and get their RecIds.. > > > > > > > > > > > > > > > > > > After this, I will have to filter "CONTACTS" family > results, > > > > based > > > > > on > > > > > > > > > RecIds received from "ACCOUNTS" [Join within records of > > > different > > > > > > > family, > > > > > > > > > based on FK] > > > > > > > > > > > > > > > > > > Is something like this achievable? Can I design it > > differently > > > to > > > > > > > satisfy > > > > > > > > > my requirements? > > > > > > > > > > > > > > > > > > > > > > > > > I may not fully understand your scenario. > > > > > > > > > > > > > > > > As I understand your example above: > > > > > > > > > > > > > > > > Row { > > > > > > > > "id" => "123", > > > > > > > > "records" => [ > > > > > > > > Record { > > > > > > > > "recordId" => "1000", "family" => "accounts", > > > > > > > > "columns" => [Column {"name" => "abc"}] > > > > > > > > }, > > > > > > > > Record { > > > > > > > > "recordId" => "1001", "family" => "contacts", > > > > > > > > "columns" => [Column {"name" => "abc"}] > > > > > > > > } > > > > > > > > ] > > > > > > > > } > > > > > > > > > > > > > > > > Let me go through some example queries that we support: > > > > > > > > > > > > > > > > +<accounts.name:abc> +<contacts.name:abc> > > > > > > > > > > > > > > > > Another way of writing it would be: > > > > > > > > > > > > > > > > <accounts.name:abc> AND <contacts.name:abc> > > > > > > > > > > > > > > > > Would yield a hit on the Row, there aren't any FKs in Blur. > > > > > > > > > > > > > > > > However if there are some interesting queries that be done > with > > > > more > > > > > > > > examples: > > > > > > > > > > > > > > > > Row { > > > > > > > > "id" => "123", > > > > > > > > "records" => [ > > > > > > > > Record { > > > > > > > > "recordId" => "1000", "family" => "accounts", > > > > > > > > "columns" => [Column {"name" => "abc"}] > > > > > > > > }, > > > > > > > > Record { > > > > > > > > "recordId" => "1001", "family" => "contacts", > > > > > > > > "columns" => [Column {"name" => "abc"}] > > > > > > > > } > > > > > > > > ] > > > > > > > > } > > > > > > > > > > > > > > > > Row { > > > > > > > > "id" => "456", > > > > > > > > "records" => [ > > > > > > > > Record { > > > > > > > > "recordId" => "1000", "family" => "accounts", > > > > > > > > "columns" => [Column {"name" => "abc"}] > > > > > > > > }, > > > > > > > > Record { > > > > > > > > "recordId" => "1001", "family" => "contacts", > > > > > > > > "columns" => [Column {"name" => "abc"}] > > > > > > > > }, > > > > > > > > Record { > > > > > > > > "recordId" => "1002", "family" => "contacts", > > > > > > > > "columns" => [Column {"name" => "def"}] > > > > > > > > } > > > > > > > > ] > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > Row { > > > > > > > > "id" => "789", > > > > > > > > "records" => [ > > > > > > > > Record { > > > > > > > > "recordId" => "1000", "family" => "accounts", > > > > > > > > "columns" => [Column {"name" => "abc"}] > > > > > > > > }, > > > > > > > > Record { > > > > > > > > "recordId" => "1002", "family" => "contacts", > > > > > > > > "columns" => [Column {"name" => "def"}] > > > > > > > > } > > > > > > > > ] > > > > > > > > } > > > > > > > > > > > > > > > > For the given query: "<accounts.name:abc> AND <contacts.name > : > > > abc>" > > > > > > would > > > > > > > > yield 2 Row hits. 123 and 456 > > > > > > > > For the given query: "<accounts.name:abc> AND <contacts.name > : > > > def>" > > > > > > would > > > > > > > > yield 2 Row hits. 456 and 789 > > > > > > > > For the given query: "<contacts.name:abc> AND <contacts.name > : > > > def>" > > > > > > would > > > > > > > > yield 1 Row hit of 456. NOTICE that the family is the same > > here > > > > > > > > "contacts". > > > > > > > > > > > > > > > > Also in Blur you can turn off the Row query and just query > the > > > > > records. > > > > > > > > This would be your typical Document like access. > > > > > > > > > > > > > > > > I fear that this has not answered your question, so if it > > hasn't > > > > > please > > > > > > > let > > > > > > > > me know. > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > Aaron > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Ravi > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Sep 17, 2013 at 7:01 PM, Aaron McCurry < > > > > [email protected] > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > First off let me say welcome! Hopefully I can answer > your > > > > > > questions > > > > > > > > > inline > > > > > > > > > > below. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Sep 17, 2013 at 6:52 AM, Ravikumar Govindarajan < > > > > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > > > > > > > I am quite new to Blur and need some help with the > > > following > > > > > > > > questions > > > > > > > > > > > > > > > > > > > > > > 1. Lets say I have a replication_factor=3 for all HDFS > > > > indexes. > > > > > > In > > > > > > > > case > > > > > > > > > > one > > > > > > > > > > > of the server hosting HDFS indexes goes down [temporary > > or > > > > > > > > take-down], > > > > > > > > > > what > > > > > > > > > > > will happen to writes? Some kind-of HintedHandoff [as > in > > > > > > Cassandra] > > > > > > > > is > > > > > > > > > > > supported? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > When there is a Blur Shard Server failure state in > > ZooKeeper > > > > will > > > > > > > > change > > > > > > > > > > and the other shard servers will take action to bring the > > > down > > > > > > > shard(s) > > > > > > > > > > online. This is similar to the HBase region model. > While > > > the > > > > > > > shard(s) > > > > > > > > > are > > > > > > > > > > being relocated (which really means being reopened from > > HDFS) > > > > > > writes > > > > > > > to > > > > > > > > > the > > > > > > > > > > shard(s) being moved are not available. However the bulk > > > load > > > > > > > > capability > > > > > > > > > > is always available as long as HDFS is available, this > can > > be > > > > > used > > > > > > > > > through > > > > > > > > > > Hadoop MapReduce. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To re-phrase, what is the Consistency Vs Availability > > > > trade-off > > > > > > in > > > > > > > > > Blur, > > > > > > > > > > > with replication_factor>1 for HDFS indexes? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Of the two Consistency is favored over Availability, > > however > > > we > > > > > are > > > > > > > > > > starting development (in 0.3.0) to increase availability > > > during > > > > > > > > failures. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. Since HDFSInputStream is used underneath, will this > > > result > > > > > in > > > > > > > too > > > > > > > > > much > > > > > > > > > > > of data-transfer back-and-forth? A case of > > > > multi-segment-merge > > > > > or > > > > > > > > even > > > > > > > > > > > wild-card search could trigger it. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Blur uses an in process file system cache (Block Cache is > > the > > > > > term > > > > > > > used > > > > > > > > > in > > > > > > > > > > the code) to reduce the IO from HDFS. During index > merges > > > data > > > > > > that > > > > > > > is > > > > > > > > > not > > > > > > > > > > in the Block Cache is read from HDFS and the output is > > > written > > > > > back > > > > > > > to > > > > > > > > > > HDFS. Overall once an index is hot (been online for some > > > time) > > > > > the > > > > > > > IO > > > > > > > > > for > > > > > > > > > > any given search is fairly small assuming that the > cluster > > > has > > > > > > enough > > > > > > > > > > memory configured in the Block Cache. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3. Does Blur also support foreign-key like semantics to > > > > search > > > > > > > across > > > > > > > > > > > column-families as well as delete using row_id? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Blur supports something called Row Queries that allow for > > > > > searches > > > > > > > > across > > > > > > > > > > column families within single Rows. Take a look at this > > page > > > > > for a > > > > > > > > > better > > > > > > > > > > explanation: > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://incubator.apache.org/blur/docs/0.2.0/data-model.html#querying > > > > > > > > > > > > > > > > > > > > And yes Blur supports deletes by Row check out: > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://incubator.apache.org/blur/docs/0.2.0/Blur.html#Fn_Blur_mutate > > > > > > > > > > and > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://incubator.apache.org/blur/docs/0.2.0/Blur.html#Struct_RowMutation > > > > > > > > > > > > > > > > > > > > Hopefully this can answer so of your questions. Let us > > know > > > if > > > > > you > > > > > > > > have > > > > > > > > > > any more. > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Aaron > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > Ravi > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
