A good way to think about this is that Accumulo provides a single Index: rowID.

You can find rows (or row with colfam, or row with colfam and colqual, etc) very quickly, but anything else is an exhaustive search.

Any time you want to search quickly against some other dimension of the data, it typically requires some pivot of your data so that other dimension is ordered by the rowID.

If you want to search records by value, you have to put the value in the Row and the ID in the value (or at least somewhere else). Thankfully, you can leverage Accumulo to very effectively store multiple indexes (inverted indexes if you will) in a single table as Accumulo allows dynamic column families.

Christopher wrote:
Since Accumulo is essentially a big sorted map, it is most efficient
searching by the row. When you search by other fields, you are
searching the entire data set, and filtering. That is usually not very
efficient. The API provides a way to do this relatively easily by
specifying family or family:qualifier, but it does not (as you've
observed) make it easy to do this by Value.

There are a few options:

1. You can configure the RegExFilter as a scan-time iterator. (This is
going to be terribly inefficient.)
2. You can adopt adopt a secondary indexing strategy.

I would do option #2. As you've described, your data is indexed by ID.
If you need an index on whatever you're storing in the Value, you
should make a new table (or new family/locality group) which stores
your data sorted by that instead of ID. You can either just store the
ID in this secondary index, and do two lookups (the secondary index to
find the ID, then the main data once you have the ID), or you can
store all the data a second time, ordered by the contents of your
Value (this trade space for performance).

There are more complex strategies, but these are the basics.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Wed, May 6, 2015 at 10:10 AM, Revan1988<[email protected]>  wrote:
Hi,
I've got an other question about using Accumulo.

My table is something like that:

ID1 info:name JhonSmith
ID1 info:birth 1988-06-26
ID1 study:university ComputerEngineering
ID1 study:graduated Yes

ID2 info:name GeorgeDuff
ID2 info:birth 1984-01-29
ID2 study:university Math
ID2 study:graduated Yes

...


I want all info about JhonSmith but with Java API I've found only method to
search by row, family or family:qualifier ...

I need to search by Value and after to use its row (IDx) to search all other
entries that has the same row (IDx).

for example i need all info about JhonSmith (birth, university, graduated
...).

I hope I explain my problem.
Sorry again for my bad english.

...and once again:
Thank you!!!



-----
Andrea Leoni
Italy
Computer Engineer
--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/Search-function-tp14030.html
Sent from the Developers mailing list archive at Nabble.com.

Reply via email to