Re: [CODE4LIB] Indexing MARC(-JSON) with MongoDB?

MJ Suhonos Thu, 13 May 2010 07:08:04 -0700

> There's been some talk in code4lib about using MongoDB to store MARC
> records in some kind of JSON format. I'd like to know if you have
> experimented with indexing those documents in MongoDB. From my limited
> exposure to MongoDB, it seems difficult, unless MongoDB supports some
> kind of "custom indexing" functionality.


First things first : it depends on what kind of "indexing" you're looking to do 
— I haven't worked with CouchDB (yet), but I have with MongoDB, and although 
it's a great (and fast) data store, it has a "basic" style of indexing as SQL 
databases.  That is, you can do exact-match, some simple regex (usually 
left-anchored) and then of course all the power of map/reduce (Mongo does 
map/reduce as well as Couch).

Doing funkier full-text indexing is one of the priorities for upcoming MongoDB 
development, as I understand.  In the interim, it might be worth having a look 
at ElasticSearch: http://www.elasticsearch.com/ — It's based on Lucene and has 
its own DSL to support fuzzy querying.  I've been playing with it and it seems 
like a smart NoSQL implementation, albeit subtly different from Mongo or Couch.

>    { "fields" : [ ["001", "001 value"], ... ] }
> 
> or this
> 
>    { "controlfield" : [ { "tag" : "001", "data" : "fst01312614" }, ... ] }
> 
> How would you specify field 001 to MongoDB?

I think you would do this using dot notation, eg.  db.records.find( { 
"controlfield.tag" : "001" } )

But I don't know enough about MARC-in-JSON to say exactly.  Have a look at:

http://www.mongodb.org/display/DOCS/Dot+Notation+%28Reaching+into+Objects%29

> It would be nice to have some kind of custom indexing, where one could
> provide an index name and separately a JavaScript function specifying
> how to obtain the keys's values for that index.
> 
> Any suggestions? Do other document oriented databases offer a better
> solution for this?

My understanding is that indexes, in MongoDB at least, operate much like they 
do in SQL RDBMS — that is, they are used to pre-hash field values for 
performance, rather than having to be explicitly defined.  ie. I *believe* if 
you don't explicitly do an ensureIndex() on a field, you can still query it, 
but it'll be slower.  But I may be wrong.

> BTW, I fed MongoDB with the example MARC records in [2] and [3], and
> it choked on them. Both are missing some commas :-)
> 
> [1] http://www.mongodb.org/display/DOCS/Indexes
> [2] http://robotlibrarian.billdueber.com/new-interest-in-marc-hash-json/
> [3] http://worldcat.org/devnet/wiki/MARC-JSON_Draft_2010-03-11

Not to start a flame war, but from my point of view, it seems rather strange 
for us to go through all this learning of new technology only to stuff MARC 
into it.  That's not to say it can't be done, or there aren't valid use cases 
for doing such a thing, but just that it seems like an odd juxtaposition.

I realize this is a bit at odds with my evangelizing at C4LN on "merging old 
and new", but really, being limited to the MARC data model with all the 
flexibility of NoSQL seems kind of like having a Ferarri and then setting the 
speed limiter at 50km/h.  Fun to drive, I _suppose_.

MJ

Re: [CODE4LIB] Indexing MARC(-JSON) with MongoDB?

Reply via email to