With 50GB XML documents you're going to have many other problems. To couch this in terms of a relational database, that's like having a 50GB row in a table. In both cases, there's probably a better way to model your data to fit with the I/O and indexing patterns of the database.
Are these 50GB documents actually many different documents aggregated into one big container? If so, you'll be much better off splitting them into individual documents in the 10K–100K range, plus or minus one order of magnitude—the equivalent of a Debit Entry versus a General Ledger; an Article versus a Magazine; an Animal versus a Zoo; etc. You're going to have to tell us a little more about your data and queries in order to recommend something more specific, though. Justin > On Aug 27, 2015, at 12:12 PM, Yang, Yun <[email protected]> wrote: > > Thanks Justin for the suggestions. So for the smaller docs, the solution will > work. What happen the docs we have are big docs, for example, over 50 GB, so > create a new element inside the same big document would have an issue for > opening and snippet, may be create a separate doc to hold the first portion > of the doc? > > Any suggestions of how to handle big doc? > > Thanks, > > Yun > > From: [email protected] > [mailto:[email protected]] On Behalf Of Justin Makeig > Sent: Thursday, August 27, 2015 2:02 PM > To: MarkLogic Developer Discussion > Subject: Re: [MarkLogic Dev General] Search in first paragraph or portion of > the documents > > Wrap the "first portion" in a new element (assuming you're talking about XML > here). The you can use something like cts:element-query > <http://docs.marklogic.com/cts:element-query > <http://docs.marklogic.com/cts:element-query>> or cts:element-word-query > <http://docs.marklogic.com/cts:element-word-query > <http://docs.marklogic.com/cts:element-word-query>> to restrict queries to > just that element. Think of the XML elements as a way to tell MarkLogic which > specific parts of the document to index. > > Justin > > On Aug 27, 2015, at 11:57 AM, Yang, Yun <[email protected] > <mailto:[email protected]>> wrote: > > All, > > We have 20 million documents, there is an use case where we must search only > the first portion of each document. Is there a way to do that? The first > portion of a document is defined as first 50 words or 100 words, etc. > > Thanks, > > Yun > > _______________________________________________ > General mailing list > [email protected] <mailto:[email protected]> > Manage your subscription at: > http://developer.marklogic.com/mailman/listinfo/general > <http://developer.marklogic.com/mailman/listinfo/general> > > _______________________________________________ > General mailing list > [email protected] > Manage your subscription at: > http://developer.marklogic.com/mailman/listinfo/general
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
