If you use search:search you may have a performance problem with highlighting. We've seen this happen with large text files where MarkLogic could find matching documents really fast but creating the highlight was really slow. This is because it takes time to find the the actual matching text within a large amount of text. We tried changing several indexing options but nothing really had any effect with the highlighting step. Our solution was to just add structure to any large text files. By converting them to xml and replacing line breaks with <p> the highighting was very faster.
I think that what happens is that MarkLogic indexes the exact values, the words, and word positions (depending on your DB settings), so queries run fast as long you don't ever have to actually retrieve the entire value of the large text element. For highlighting, the entire actual value is retrieve then parsed to find where in that value the match is and that can take some time. I would expect that if your XPath or cts:queries have to be filtered that you would have a performance problem there, too, because the filtering step will need to open the doc and confirm that the candidate document does indeed match the query which may involved parsing through that large text field. But with structure added, the element values to retrieve are much smaller (whatever s between the <p> tags) so highlighting and filtering is much faster. So I would add structure for sentences perhaps and not have it be one large text value. But test it out and see what you find. -Ryan To: general@developer.marklogic.com From: abhishek5...@tcs.com Date: Tue, 14 Jun 2011 16:43:48 +0530 Subject: Re: [MarkLogic Dev General] Marklogic database fields over large text Thanks Geert ! I also expect Field will share the indexing which is already there for database configuration. Regards Abhishek Srivastav Systems Engineer Tata Consultancy Services Cell:- +91-9883389968 Mailto: abhishek5...@tcs.com Website: http://www.tcs.com ____________________________________________ Experience certainty. IT Services Business Solutions Outsourcing ____________________________________________ From: Geert Josten <geert.jos...@daidalos.nl> To: General MarkLogic Developer Discussion <general@developer.marklogic.com> Date: 06/14/2011 04:10 PM Subject: Re: [MarkLogic Dev General] Marklogic database fields over large text Sent by: general-boun...@developer.marklogic.com Hi Abhishek, Right, database fields. I overlooked that bit in my previous reply. But most still stands. The documentation (http://docs.marklogic.com/4.2doc/docapp.xqy#display.xqy?fname=http://pubs/4.2doc/xml/admin/fields.xml&query=database+field) states that field-query are similar to word-query, but constrained to specific elements. The word index is not an option, it is always there. I thought that element context is already available as well, so I would expect fields to reuse available index information for the most part. On the other hand, if your three elements already make up the major part of your documents, why use fields at all? Is it to explicitly exclude information in other elements? I don’t think using a field will give much performance improvement. As always however, the best way to know is to measure. Gajanan summarizes good alternatives. If performance is that important, then I recommend to simply compare the performance of all of them.. ;-) Kind regards, Geert Van: general-boun...@developer.marklogic.com [mailto:general-boun...@developer.marklogic.com] Namens Abhishek53 S Verzonden: dinsdag 14 juni 2011 12:14 Aan: General MarkLogic Developer Discussion Onderwerp: Re: [MarkLogic Dev General] Marklogic database fields over large text Thanks Geert ! Somewhat you are in the right track to my questions. Over my database say some indexing options are already enabled. The same I need to redo over the field .Is it not duplicate index size increase? Let me correct if I am wrong. Why I am worry about the index size is because all the elements belonging to the database field are mazor portion of the total content. If I consider indexing mechanism , Once the document will ingested database level indexing as well as field level indexing will be form. Common settings have duplicate space consumption. Thanks & Regards Abhishek Srivastav Systems Engineer Tata Consultancy Services Cell:- +91-9883389968 Mailto: abhishek5...@tcs.com Website: http://www.tcs.com ____________________________________________ Experience certainty. IT Services Business Solutions Outsourcing ____________________________________________ From: Geert Josten <geert.jos...@daidalos.nl> To: General MarkLogic Developer Discussion <general@developer.marklogic.com> Date: 06/14/2011 01:31 PM Subject: Re: [MarkLogic Dev General] Marklogic database fields over large text Sent by: general-boun...@developer.marklogic.com Hi Abishek, As far as I know, when talking about the Word index, the index is what it says: an index of words. The element value is tokenized into unique words. The search term of word queries likewise. All individual words of the search term are searched for in the word index. After that, the results are filtered on other criteria. That way the length of the element value shouldn’t matter much. When talking about element value queries: that usually concerns an exact match. The length of the element value shouldn’t make much difference here either. I am not aware of value index length limitations.. Kind regards, Geert Van: general-boun...@developer.marklogic.com [mailto:general-boun...@developer.marklogic.com] Namens Abhishek53 S Verzonden: dinsdag 14 juni 2011 9:35 Aan: General MarkLogic Developer Discussion Onderwerp: [MarkLogic Dev General] Marklogic database fields over large text Hi All My question is related to the database field. Sample content as follows <sample> <title>EST</title> <data> <inner> <title>ABC.</title> <text1>Very large text.....Very large text.....Very large text.....Very large text.....Very large text.....</text1> <inner> <inner> <title>XYZ.</title> <text2>Very large text....Very large text.....Very large text.....Very large text.....Very large text......</text2> <inner> <inner> <title>RST</title> <text3>Very large text....Very large text.....Very large text.....Very large text.....Very large text......</text3> <inner> <data> </sample> I need to perform constrained search over the elements text1,text2, and text3 present under data element. If I will select database field approach and select text1 2 & 3 are included elements into database field to write the constrained filed query,. Is this a good approach keeping performance in mind if text1 .2 & 3 have very large text. Thanks in advance Abhishek Srivastav Systems Engineer Tata Consultancy Services Cell:- +91-9883389968 Mailto: abhishek5...@tcs.com Website: http://www.tcs.com ____________________________________________ Experience certainty. IT Services Business Solutions Outsourcing ____________________________________________ =====-----=====-----===== Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you _______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general