Right. That's what I figured. Thanks Danny and William. From: [email protected] [mailto:[email protected]] On Behalf Of William Merritt Sawyer Sent: Wednesday, February 27, 2013 3:17 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Accounting for empty element that has range index
If you are running unfiltered you can create a custom transform and do the check there. <transform-results apply="function-name" ns="module namespace" at="path to module" /> From: [email protected] [mailto:[email protected]] On Behalf Of Danny Sokolsky Sent: Wednesday, February 27, 2013 12:49 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Accounting for empty element that has range index I'm not positive, but I think you can just add: <search-option>unfiltered</search-option> as a child of search:options and then it would run unfiltered. Then you can do some kind of validation on the documents before you return them. -Danny From: [email protected]<mailto:[email protected]> [mailto:[email protected]] On Behalf Of Steiner, David J. (LNG-DAY) Sent: Wednesday, February 27, 2013 11:40 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Accounting for empty element that has range index So, I guess I'd have to do that in a custom constraint since there's not really a way to do that through the search API directly, correct? From: [email protected]<mailto:[email protected]> [mailto:[email protected]] On Behalf Of Danny Sokolsky Sent: Wednesday, February 27, 2013 2:28 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Accounting for empty element that has range index On other thought: If you do want to filter, you can always run the filter stage separately (using a cts:contains) and do a try/catch and just throw away the ones that get that XDMP-CAST exception, or perhaps logging the exception for later cleanup. Something like this: xdmp:document-insert("/foo1.xml", <a><hello>123</hello></a>), xdmp:document-insert("/foo2.xml", <a><hello>abc</hello></a>) ; let $query := cts:element-range-query(xs:QName("hello"), "=", 123) for $x in cts:search(collection(), $query, "unfiltered") return (try{$x[cts:contains($x, $query)]} catch($e) {$e} ) -Danny From: [email protected]<mailto:[email protected]> [mailto:[email protected]] On Behalf Of Danny Sokolsky Sent: Wednesday, February 27, 2013 11:12 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Accounting for empty element that has range index Hi David, "Ignore" just allows you to load the data without throwing an exception - it does not allow you to run those range queries against that data without throwing exceptions. The reason it does not allow that is that would be giving a wrong answer. And the reason you are getting that is that you are running your search filtered. If you run it unfiltered, it will just give you the results as the indexes say they are. For example, the following will work (assume an int index on "hello", but it will return the "bad" data as well: xdmp:document-insert("/foo1.xml", <a><hello>123</hello></a>), xdmp:document-insert("/foo2.xml", <a><hello>abc</hello></a>) ; cts:search(collection(), cts:element-range-query(xs:QName("hello"), "=", 123), "unfiltered") ð returns both documents If you run this filtered, it will give you the XDMP-CAST error, because you are asking the filter to run an invalid query. Make sense? -Danny From: [email protected]<mailto:[email protected]> [mailto:[email protected]] On Behalf Of Steiner, David J. (LNG-DAY) Sent: Wednesday, February 27, 2013 10:53 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Accounting for empty element that has range index Hi Danny, Yes, I know I have data in that element that doesn't CAST. Actually, there are documents that have an empty element. Essentially, I'd like for them simply to be ignored when the range index is created, but continue to exist in the DB with that element empty - there are other times I want to find those that are empty. Now, I can go and change the data, but this is not desirable. Essentially it's a sparse data problem that is part of revealing these shortcomings through reporting. What I want to figure out is why "ignore" doesn't work? Do I just not understand what it's supposed to do? Would "reject" do something different than what I'm thinking (which is that the whole document gets rejected - if the doc is already in the DB and you re-index after changing to "reject", does the document get removed from the DB)? I had assumed that the range index was a way to bucketize ranges, but is there some other mechanism I should use when I know that it's possible that an element I want to use might be empty? Thanks, David From: [email protected]<mailto:[email protected]> [mailto:[email protected]] On Behalf Of Danny Sokolsky Sent: Tuesday, February 26, 2013 8:19 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Accounting for empty element that has range index Hi David, Do you have your file log level turned to Debug? I think it needs to be debug to see the reindexing errors. I think the quickest path to a solution here will be to create a 1-document database that shows this issue. Then you will probably be able to figure out the issue. My guess is that you have data in that element that does not cast to your type. The fix would be to fix the data. Range queries will throw exceptions when the data is bad. -Danny From: [email protected]<mailto:[email protected]> [mailto:[email protected]] On Behalf Of Steiner, David J. (LNG-DAY) Sent: Tuesday, February 26, 2013 1:49 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Accounting for empty element that has range index According to the Administrator's guide, pg. 157: Select the type for the range index. Note that the data must match the type; if it does not conform to the type specified for the range index, then new documents containing non-matching field data cannot be loaded and existing documents will not be able to be reindexed for the field (reindexing exceptions are logged to the ErrorLog.txt file). However, I see no exceptions being logged. So, when I re-index, it does not appear that "existing documents will not be able to be reindexed for the field". If that were actually happening, which is what I want to happen, I would think that I wouldn't be getting the errors I am when trying to search. From: [email protected]<mailto:[email protected]> [mailto:[email protected]] On Behalf Of Steiner, David J. (LNG-DAY) Sent: Tuesday, February 26, 2013 1:38 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Accounting for empty element that has range index Hi Geert, Full reindex (hitting reindex button on Admin for DB) didn't change my results - still get the same errors. Thanks anyway, David From: [email protected]<mailto:[email protected]> [mailto:[email protected]] On Behalf Of Geert Josten Sent: Tuesday, February 26, 2013 10:14 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Accounting for empty element that has range index Hi David, Perhaps gYear and int is treated the same internally. Switching to string, and back has likely more effect. But you can also 'force' a reindex by hitting the Reindex button in the Database configuration page in the Admin interface. Perhaps just as easy.. Kind regards, Geert Van: [email protected]<mailto:[email protected]> [mailto:[email protected]<mailto:[email protected]>] Namens Steiner, David J. (LNG-DAY) Verzonden: dinsdag 26 februari 2013 16:00 Aan: MarkLogic Developer Discussion Onderwerp: Re: [MarkLogic Dev General] Accounting for empty element that has range index Hi Geert, Well, I have the setting set to "ignore" and I tried to reindex with another element that I'm having the same problem with - I changed that one from gYear to int and then back to gYear. I would've thought that this would have two chances to "ignore" documents that had empty elements (once going to int, then once coming back to gYear), but I continue to get the same error with that element: [1.0-ml] XDMP-CAST: (err:FORG0001) cts:search(fn:collection(), cts:element-range-query(fn:QName("", "YEAR_ESTABLISHED"), "<", xs:gYear("1800"), (), 1), "score-logtfidf", 1) -- Invalid cast: xs:untypedAtomic("") cast as xs:gYear So, at least this method of re-indexing with "ignore" doesn't prevent the problem. Kind regards, David From: [email protected]<mailto:[email protected]> [mailto:[email protected]] On Behalf Of Geert Josten Sent: Tuesday, February 26, 2013 9:54 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Accounting for empty element that has range index Hi David, I think I'd recommend getting rid of the empty elements, but perhaps you have reason to want to preserve them. In that case, changing the invalid values option to 'ignore' is by far the easiest. Just toggle the setting, save the change, and you should be able to insert documents that have empty elements. Otherwise they would get rejected. I am not entirely sure about the existing documents that have an empty element. MarkLogic might automatically start reindexing, and solve the issue for you. But you should notice that soon enough if you change the setting. You should see a reindex running, and eventually your error message should disappear. If not, you will have to isolate those docs, and reinsert/'touch' them.. :-/ Kind regards, Geert Van: [email protected]<mailto:[email protected]> [mailto:[email protected]<mailto:[email protected]>] Namens Steiner, David J. (LNG-DAY) Verzonden: dinsdag 26 februari 2013 15:18 Aan: 'MarkLogic Developer Discussion' Onderwerp: [MarkLogic Dev General] Accounting for empty element that has range index I have data that I loaded and then decided to put an element range index on. In some documents the element is empty. So, I think, when I try to bucketize the facet in a constraint, I'm getting an error because of these empty values - at least that's what it seems like. I don't get any error when I run "search:check-options" (with $strict set to true). I'm hoping that there's a way to access this range and ignore the empty values (or include them in a bucket) if possible without reprocessing data. Here's the error: [1.0-ml] XDMP-CAST: (err:FORG0001) cts:search(fn:collection(), cts:and-query((cts:element-range-query(fn:QName("", "FIRM_SIZE"), ">=", xs:int("2"), (), 1), cts:element-range-query(fn:QName("", "FIRM_SIZE"), "<", xs:int("10"), (), 1)), ()), "score-logtfidf", 1) -- Invalid cast: xs:untypedAtomic("") cast as xs:int Here's the constraint: <options xmlns="http://marklogic.com/appservices/search"> <search:constraint name="firmSize"> <search:range type="xs:int" facet="true"> <search:bucket lt="2" ge="1" name="1">1</search:bucket> <search:bucket lt="10" ge="2" name="2through9">2-9</search:bucket> <search:bucket lt="20" ge="10" name="10through19">10-19</search:bucket> <search:bucket lt="50" ge="20" name="20through49">20-49</search:bucket> <search:bucket lt="100" ge="50" name="50through99">50-99</search:bucket> <search:bucket lt="200" ge="100" name="100through199">100-199</search:bucket> <search:bucket lt="500" ge="200" name="200through499">200-499</search:bucket> <search:bucket ge="500" name="Over500">Over500</search:bucket> <search:facet-option>limit=10</search:facet-option> <search:element ns="" name="FIRM_SIZE"/> <search:fragment-scope>documents</search:fragment-scope> </search:range> </search:constraint> </options> I guess I need a bucket that categorizes "null" (empty)? Or will this just never work because I have documents that have empty elements for a range index element? If I have to reprocess, can I just re-index, or do I have to actually delete those elements? If I re-index, is it the setting for "invalid values" that needs to change? I would've thought that if it is set to "ignore" then an empty value wouldn't have been included in the range. I don't want the whole document "reject"ed, but maybe I just don't understand what that means (to me it seems like the document wouldn't be loaded; but I want the document loaded, I just don't care to have it counted in a facet for which that element is empty, or a way to count the empty ones if there's not a way to ignore them). Thanks, David NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
