24 апреля 2010 г. 23:59 пользователь Ryan Rawson <ryano...@gmail.com>написал:
> On Sat, Apr 24, 2010 at 12:22 AM, Andrey Stepachev <oct...@gmail.com> > wrote: > > 2010/4/24 Andrew Nguyen <andrew-lists-hb...@ucsfcti.org> > > > >> Hello all, > >> > >> Each row key is of the form "PatientName-PhysiologicParameter" and each > >> column name is the timestamp of the reading. > >> > > > > With such design in hbase (in opposite to cassandra) you should use row > > filters to get only part of data (for example last year) or use client > > filtering with row scan. > > If data series will be big (>100) you will run in issue of infra row > > scanning https://issues.apache.org/jira/browse/HBASE-1537, > > as I did. Another issue, as mentioned before, is scaling. Hbase splits > data > > by rows. > > > > Нou have to figure out how much data will be in a row, and if it counts > to > > hundreds, use compound key (patient-code-date), > > If they are small, may be more easy to use will be (patient-code) because > > you can use Get operations with locks (if you need them), and in case of > > dated key, you can't (because scan doesn't yet honor locks). > > This statement is happily obsolete - 0.20.4 RC has new code that makes > it so that Gets and Scans never return partially updated rows. I > dislike the term 'honor locks' because it implies an implementation > strategy, and in this case Gets (which are now 1 row scans) and Scans > do not acquire locks to accomplish their tasks. This is important > because if you acquired a row lock (which is exclusive) you would only > be able to have 1 read and write operation at a time, whereas we > really want 1 write operation and as many read operations. > No. I mean the scenario, when I want to lock row for writing, but you can't lock row for all dates at once. In case of patient-code it is easy. In patienc-code-date you should use some artifical date or use zookepeer directly. Personally, I prefer compound keys, as I mention before. In my message I point to very important thing - intrarow scanning in a case of very huge columns. Scan will faill with OOM if you try to read such row. > > For example if you are storing timeseries data for a monitoring > system, you might want to store it by row, since the number of points > for a single system might be arbitrarily large (think: 2 years+ of > data). In this case if the expected data set size per row is larger > than what a single machine could conceivably store, Cassandra would > not work for you in this case (since each row must be stored on a > single (er N) node(s)). > Really, In my reply I say only that cassandra has an API for scanning columns. I understand, t 2010/4/25 Andrew Nguyen <andrew-lists-hb...@ucsfcti.org> > You mention tall tables - this sounds consistent with what Erik and Andrey have said. Given that, just to clarify my > understanding, I'm probably looking at a single table with only one column (the value, which Andrey names as "series"???) and > billiions of rows, right? Exactly. In case of columns as PhysiologicParameter really bettery solution. Like series:ABP series:HP etc.