[
https://issues.apache.org/jira/browse/ACCUMULO-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eric Newton resolved ACCUMULO-1417.
-----------------------------------
Resolution: Fixed
Code to ingest the Google Books ngrams was added. I posted some numbers on the
efficiency of the ingest and storage [here|http://tinyurl.com/nrvj7xv].
Other key-value stores can compare their numbers, if they like. Beating
compressed CSV's was an unexpected result.
> data storage efficiency
> -----------------------
>
> Key: ACCUMULO-1417
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1417
> Project: Accumulo
> Issue Type: Task
> Reporter: Eric Newton
>
> David Medinets wrote the user's list:
> {quote}
> Are there any published numbers for the amount of disk space used by
> Accumulo versus other products? I'm thinking some dataset like dbpedia
> or something from http://books.google.com/ngrams/datasets. If there is
> not such a comparison, what comparisons would you like to see? What
> about WordNet stored in CSV, MySQL, Cassandra, HBase, and Accumulo?
> WordNet is just a large set of CSV files so it would be a good
> candidate for this concept, I think.
> {quote}
> Good idea.
--
This message was sent by Atlassian JIRA
(v6.2#6252)