Hi David,
"Ignore" just allows you to load the data without throwing an exception - it
does not allow you to run those range queries against that data without
throwing exceptions. The reason it does not allow that is that would be giving
a wrong answer. And the reason you are getting that is that you are running
your search filtered. If you run it unfiltered, it will just give you the
results as the indexes say they are. For example, the following will work
(assume an int index on "hello", but it will return the "bad" data as well:
xdmp:document-insert("/foo1.xml", <a><hello>123</hello></a>),
xdmp:document-insert("/foo2.xml", <a><hello>abc</hello></a>) ;
cts:search(collection(), cts:element-range-query(xs:QName("hello"), "=", 123),
"unfiltered")
ð returns both documents
If you run this filtered, it will give you the XDMP-CAST error, because you are
asking the filter to run an invalid query.
Make sense?
-Danny
From: [email protected]
[mailto:[email protected]] On Behalf Of Steiner, David J.
(LNG-DAY)
Sent: Wednesday, February 27, 2013 10:53 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Accounting for empty element that has
range index
Hi Danny,
Yes, I know I have data in that element that doesn't CAST. Actually, there are
documents that have an empty element. Essentially, I'd like for them simply to
be ignored when the range index is created, but continue to exist in the DB
with that element empty - there are other times I want to find those that are
empty.
Now, I can go and change the data, but this is not desirable. Essentially it's
a sparse data problem that is part of revealing these shortcomings through
reporting.
What I want to figure out is why "ignore" doesn't work? Do I just not
understand what it's supposed to do? Would "reject" do something different
than what I'm thinking (which is that the whole document gets rejected - if the
doc is already in the DB and you re-index after changing to "reject", does the
document get removed from the DB)?
I had assumed that the range index was a way to bucketize ranges, but is there
some other mechanism I should use when I know that it's possible that an
element I want to use might be empty?
Thanks,
David
From:
[email protected]<mailto:[email protected]>
[mailto:[email protected]] On Behalf Of Danny Sokolsky
Sent: Tuesday, February 26, 2013 8:19 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Accounting for empty element that has
range index
Hi David,
Do you have your file log level turned to Debug? I think it needs to be debug
to see the reindexing errors.
I think the quickest path to a solution here will be to create a 1-document
database that shows this issue. Then you will probably be able to figure out
the issue. My guess is that you have data in that element that does not cast
to your type. The fix would be to fix the data. Range queries will throw
exceptions when the data is bad.
-Danny
From:
[email protected]<mailto:[email protected]>
[mailto:[email protected]] On Behalf Of Steiner, David
J. (LNG-DAY)
Sent: Tuesday, February 26, 2013 1:49 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Accounting for empty element that has
range index
According to the Administrator's guide, pg. 157:
Select the type for the range index. Note that the data must match the type; if
it does not conform to the type specified for the range index, then new
documents containing non-matching field data cannot be loaded and existing
documents will not be able to be reindexed for the field (reindexing exceptions
are logged to the ErrorLog.txt file).
However, I see no exceptions being logged.
So, when I re-index, it does not appear that "existing documents will not be
able to be reindexed for the field". If that were actually happening, which is
what I want to happen, I would think that I wouldn't be getting the errors I am
when trying to search.
From:
[email protected]<mailto:[email protected]>
[mailto:[email protected]] On Behalf Of Steiner, David
J. (LNG-DAY)
Sent: Tuesday, February 26, 2013 1:38 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Accounting for empty element that has
range index
Hi Geert,
Full reindex (hitting reindex button on Admin for DB) didn't change my results
- still get the same errors.
Thanks anyway,
David
From:
[email protected]<mailto:[email protected]>
[mailto:[email protected]] On Behalf Of Geert Josten
Sent: Tuesday, February 26, 2013 10:14 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Accounting for empty element that has
range index
Hi David,
Perhaps gYear and int is treated the same internally. Switching to string, and
back has likely more effect. But you can also 'force' a reindex by hitting the
Reindex button in the Database configuration page in the Admin interface.
Perhaps just as easy..
Kind regards,
Geert
Van:
[email protected]<mailto:[email protected]>
[mailto:[email protected]<mailto:[email protected]>]
Namens Steiner, David J. (LNG-DAY)
Verzonden: dinsdag 26 februari 2013 16:00
Aan: MarkLogic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] Accounting for empty element that has
range index
Hi Geert,
Well, I have the setting set to "ignore" and I tried to reindex with another
element that I'm having the same problem with - I changed that one from gYear
to int and then back to gYear. I would've thought that this would have two
chances to "ignore" documents that had empty elements (once going to int, then
once coming back to gYear), but I continue to get the same error with that
element:
[1.0-ml] XDMP-CAST: (err:FORG0001) cts:search(fn:collection(),
cts:element-range-query(fn:QName("", "YEAR_ESTABLISHED"), "<",
xs:gYear("1800"), (), 1), "score-logtfidf", 1) -- Invalid cast:
xs:untypedAtomic("") cast as xs:gYear
So, at least this method of re-indexing with "ignore" doesn't prevent the
problem.
Kind regards,
David
From:
[email protected]<mailto:[email protected]>
[mailto:[email protected]] On Behalf Of Geert Josten
Sent: Tuesday, February 26, 2013 9:54 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Accounting for empty element that has
range index
Hi David,
I think I'd recommend getting rid of the empty elements, but perhaps you have
reason to want to preserve them. In that case, changing the invalid values
option to 'ignore' is by far the easiest. Just toggle the setting, save the
change, and you should be able to insert documents that have empty elements.
Otherwise they would get rejected.
I am not entirely sure about the existing documents that have an empty element.
MarkLogic might automatically start reindexing, and solve the issue for you.
But you should notice that soon enough if you change the setting. You should
see a reindex running, and eventually your error message should disappear.
If not, you will have to isolate those docs, and reinsert/'touch' them.. :-/
Kind regards,
Geert
Van:
[email protected]<mailto:[email protected]>
[mailto:[email protected]<mailto:[email protected]>]
Namens Steiner, David J. (LNG-DAY)
Verzonden: dinsdag 26 februari 2013 15:18
Aan: 'MarkLogic Developer Discussion'
Onderwerp: [MarkLogic Dev General] Accounting for empty element that has range
index
I have data that I loaded and then decided to put an element range index on.
In some documents the element is empty. So, I think, when I try to bucketize
the facet in a constraint, I'm getting an error because of these empty values -
at least that's what it seems like. I don't get any error when I run
"search:check-options" (with $strict set to true). I'm hoping that there's a
way to access this range and ignore the empty values (or include them in a
bucket) if possible without reprocessing data.
Here's the error:
[1.0-ml] XDMP-CAST: (err:FORG0001) cts:search(fn:collection(),
cts:and-query((cts:element-range-query(fn:QName("", "FIRM_SIZE"), ">=",
xs:int("2"), (), 1), cts:element-range-query(fn:QName("", "FIRM_SIZE"), "<",
xs:int("10"), (), 1)), ()), "score-logtfidf", 1) -- Invalid cast:
xs:untypedAtomic("") cast as xs:int
Here's the constraint:
<options xmlns="http://marklogic.com/appservices/search">
<search:constraint name="firmSize">
<search:range type="xs:int" facet="true">
<search:bucket lt="2" ge="1" name="1">1</search:bucket>
<search:bucket lt="10" ge="2" name="2through9">2-9</search:bucket>
<search:bucket lt="20" ge="10" name="10through19">10-19</search:bucket>
<search:bucket lt="50" ge="20" name="20through49">20-49</search:bucket>
<search:bucket lt="100" ge="50" name="50through99">50-99</search:bucket>
<search:bucket lt="200" ge="100"
name="100through199">100-199</search:bucket>
<search:bucket lt="500" ge="200"
name="200through499">200-499</search:bucket>
<search:bucket ge="500" name="Over500">Over500</search:bucket>
<search:facet-option>limit=10</search:facet-option>
<search:element ns="" name="FIRM_SIZE"/>
<search:fragment-scope>documents</search:fragment-scope>
</search:range>
</search:constraint>
</options>
I guess I need a bucket that categorizes "null" (empty)? Or will this just
never work because I have documents that have empty elements for a range index
element?
If I have to reprocess, can I just re-index, or do I have to actually delete
those elements?
If I re-index, is it the setting for "invalid values" that needs to change? I
would've thought that if it is set to "ignore" then an empty value wouldn't
have been included in the range. I don't want the whole document "reject"ed,
but maybe I just don't understand what that means (to me it seems like the
document wouldn't be loaded; but I want the document loaded, I just don't care
to have it counted in a facet for which that element is empty, or a way to
count the empty ones if there's not a way to ignore them).
Thanks,
David
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general