Hola Aaron, Sorry it took 4 days to answer, but i was out of the office and had 2 birthdays to attend to at the WE ;)
>> Looking quickly at the code, it looks like field search is implemented >> by performing a scan (with filtering) on the field search table in >> HBase. In that case, I was wondering what use cases, design >> considerations, or assumptions will be associated with >> HBaseFieldSearch? The project is called SCAPE (http://www.scape-project.eu/) and aims at creating a fully functional Digital Preservation System with a workflow engine to have the system do different preservation actions (migrations, rebasing, emulation etc.) with a rule based DSL. We're still in an early stage of designing the application we want to create and are still evaluating possibilities. The use-case is defined very loosely: Having a Content Repository System, which is horizontically scalable on commodity hardware, preferrably stored in a way that parallel processing of the data can easily take place. This means the data should be at least easily exportable to a HDFS where MapReduce processing of the data may take place efficiently. Our current idea is to use fedora on the backend of the system, therefore im quite interested in developing around the HighlevelStore ideas you guys were thinking about implementing. Also we have to take the different sizes of digital objects objects into account since storage of small files is inefficient in HDFS and big files are inefficient in HBase, but the System should be designed that it works as well with terabyte big media files, as with small text objects. So we're thinking about deciding on the objects size where to put it, whether in a HDFS, a HBase BigTable or Hadoop-archives, -sequence Files or -map files. The FieldSearch would be the endpoint where the workflow engine should decide on which objects to operate, or the endpoint for the user who is trying to search through the whole repository for some metadata entry/datastream entry. The HBaseFiledSearch i implemented is nothing but a simple PoC, which operated on the HBayse table strucure which i wrote about last time. But this is in now way how we think it should look in the end and i think it's much more likely that we will operate on some kind of index, probably some lucene index or even a whole Solr server. Hope that cleared things im trying to achieve a bit up, but please feel free to ask my any questions... Kind regards, Frank -- Frank Asseg ePublishing & eScience Development & Applied Research Phone +49 7247-808-515 Fax +49 7247 808-133 [email protected] FIZ Karlsruhe – Leibniz Institute for Information Infrastructure Hermann-von-Helmholtz-Platz 1 76344 Eggenstein-Leopoldshafen, Germany http://www.fiz-karlsruhe.de/ ------------------------------------------------------- Fachinformationszentrum Karlsruhe, Gesellschaft für wissenschaftlich-technische Information mbH. Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892. Geschäftsführerin: Sabine Brünger-Weilandt. Vorsitzender des Aufsichtsrats: MinDirig Dr. Thomas Greiner. ------------------------------------------------------------------------------ AppSumo Presents a FREE Video for the SourceForge Community by Eric Ries, the creator of the Lean Startup Methodology on "Lean Startup Secrets Revealed." This video shows you how to validate your ideas, optimize your ideas and identify your business strategy. http://p.sf.net/sfu/appsumosfdev2dev _______________________________________________ Fedora-commons-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
