Re: [fcrepo-dev] FieldSearch

Asseg, Frank Mon, 18 Jul 2011 02:55:04 -0700

Hola Aaron,

Sorry it took 4 days to answer, but i was out of the office and had 2 
birthdays to attend to at the WE ;)


>> Looking quickly at the code, it looks like field search is implemented
>> by performing a scan (with filtering) on the field search table in
>> HBase.  In that case, I was wondering what use cases, design
>> considerations, or assumptions will be associated with
>> HBaseFieldSearch?

  The project is called SCAPE (http://www.scape-project.eu/) and aims at 
creating a fully functional Digital Preservation System with a workflow 
engine to have the system do different preservation actions (migrations, 
rebasing, emulation etc.) with a rule based DSL.
We're still in an early stage of designing the application we want to 
create and are still evaluating possibilities.
  The use-case is defined very loosely: Having a Content Repository 
System, which is horizontically scalable on commodity hardware, 
preferrably stored in a way that parallel processing of the data can 
easily take place. This means the data should be at least easily 
exportable to a HDFS where MapReduce processing of the data may take 
place efficiently.
  Our current idea is to use fedora on the backend of the system, 
therefore im quite interested in developing around the HighlevelStore 
ideas you guys were thinking about implementing.
  Also we have to take the different sizes of digital objects objects 
into account since storage of small files is inefficient in HDFS and big 
files are inefficient in HBase, but the System should be designed that 
it works as well with terabyte big media files, as with small text 
objects. So we're thinking about deciding on the objects size where to 
put it, whether in a HDFS, a HBase BigTable or Hadoop-archives, 
-sequence Files or -map files.
  The FieldSearch would be the endpoint where the workflow engine should 
decide on which objects to operate, or the endpoint for the user who is 
trying to search through the whole repository for some metadata 
entry/datastream entry.
  The HBaseFiledSearch i implemented is nothing but a  simple PoC, which 
operated on the HBayse table strucure which i wrote about last time.
  But this is in now way how we think it should look in the end and i 
think it's much more likely that we will operate on some kind of index, 
probably some lucene index or even a whole Solr server.

Hope that cleared things im trying to achieve a bit up, but please feel 
free to ask my any questions...

Kind regards,

Frank

-- 
Frank Asseg
ePublishing & eScience
Development & Applied Research
Phone +49 7247-808-515
Fax +49 7247 808-133
[email protected]


FIZ Karlsruhe – Leibniz Institute for Information Infrastructure
Hermann-von-Helmholtz-Platz 1
76344 Eggenstein-Leopoldshafen, Germany

http://www.fiz-karlsruhe.de/


-------------------------------------------------------

Fachinformationszentrum Karlsruhe, Gesellschaft für wissenschaftlich-technische 
Information mbH. 
Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 
101892. 
Geschäftsführerin: Sabine Brünger-Weilandt. 
Vorsitzender des Aufsichtsrats: MinDirig Dr. Thomas Greiner.

------------------------------------------------------------------------------
AppSumo Presents a FREE Video for the SourceForge Community by Eric 
Ries, the creator of the Lean Startup Methodology on "Lean Startup 
Secrets Revealed." This video shows you how to validate your ideas, 
optimize your ideas and identify your business strategy.
http://p.sf.net/sfu/appsumosfdev2dev
_______________________________________________
Fedora-commons-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Re: [fcrepo-dev] FieldSearch

Reply via email to