Hi Danny,

Yes, I know I have data in that element that doesn't CAST.  Actually, there are 
documents that have an empty element.  Essentially, I'd like for them simply to 
be ignored when the range index is created, but continue to exist in the DB 
with that element empty - there are other times I want to find those that are 
empty.
Now, I can go and change the data, but this is not desirable.  Essentially it's 
a sparse data problem that is part of revealing these shortcomings through 
reporting.

What I want to figure out is why "ignore" doesn't work?  Do I just not 
understand what it's supposed to do?  Would "reject" do something different 
than what I'm thinking (which is that the whole document gets rejected - if the 
doc is already in the DB and you re-index after changing to "reject", does the 
document get removed from the DB)?

I had assumed that the range index was a way to bucketize ranges, but is there 
some other mechanism I should use when I know that it's possible that an 
element I want to use might be empty?

Thanks,
David

From: [email protected] 
[mailto:[email protected]] On Behalf Of Danny Sokolsky
Sent: Tuesday, February 26, 2013 8:19 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Accounting for empty element that has 
range index

Hi David,

Do you have your file log level turned to Debug?  I think it needs to be debug 
to see the reindexing errors.

I think the quickest path to a solution here will be to create a 1-document 
database that shows this issue.  Then you will probably be able to figure out 
the issue.  My guess is that you have data in that element that does not cast 
to your type.  The fix would be to fix the data.  Range queries will throw 
exceptions when the data is bad.

-Danny

From: [email protected] 
[mailto:[email protected]] On Behalf Of Steiner, David J. 
(LNG-DAY)
Sent: Tuesday, February 26, 2013 1:49 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Accounting for empty element that has 
range index

According to the Administrator's guide, pg. 157:
Select the type for the range index. Note that the data must match the type; if 
it does not conform to the type specified for the range index, then new 
documents containing non-matching field data cannot be loaded and existing 
documents will not be able to be reindexed for the field (reindexing exceptions 
are logged to the ErrorLog.txt file).

However, I see no exceptions being logged.

So, when I re-index, it does not appear that "existing documents will not be 
able to be reindexed for the field".  If that were actually happening, which is 
what I want to happen, I would think that I wouldn't be getting the errors I am 
when trying to search.

From: 
[email protected]<mailto:[email protected]>
 [mailto:[email protected]] On Behalf Of Steiner, David 
J. (LNG-DAY)
Sent: Tuesday, February 26, 2013 1:38 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Accounting for empty element that has 
range index

Hi Geert,

Full reindex (hitting reindex button on Admin for DB) didn't change my results 
- still get the same errors.

Thanks anyway,
David

From: 
[email protected]<mailto:[email protected]>
 [mailto:[email protected]] On Behalf Of Geert Josten
Sent: Tuesday, February 26, 2013 10:14 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Accounting for empty element that has 
range index

Hi David,

Perhaps gYear and int is treated the same internally. Switching to string, and 
back has likely more effect. But you can also 'force' a reindex by hitting the 
Reindex button in the Database configuration page in the Admin interface. 
Perhaps just as easy..

Kind regards,
Geert

Van: 
[email protected]<mailto:[email protected]>
 
[mailto:[email protected]<mailto:[email protected]>]
 Namens Steiner, David J. (LNG-DAY)
Verzonden: dinsdag 26 februari 2013 16:00
Aan: MarkLogic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] Accounting for empty element that has 
range index

Hi Geert,

Well, I have the setting set to "ignore" and I tried to reindex with another 
element that I'm having the same problem with - I changed that one from gYear 
to int and then back to gYear.  I would've thought that this would have two 
chances to "ignore" documents that had empty elements (once going to int, then 
once coming back to gYear), but I continue to get the same error with that 
element:

[1.0-ml] XDMP-CAST: (err:FORG0001) cts:search(fn:collection(), 
cts:element-range-query(fn:QName("", "YEAR_ESTABLISHED"), "<", 
xs:gYear("1800"), (), 1), "score-logtfidf", 1) -- Invalid cast: 
xs:untypedAtomic("") cast as xs:gYear

So, at least this method of re-indexing with "ignore" doesn't prevent the 
problem.

Kind regards,
David

From: 
[email protected]<mailto:[email protected]>
 [mailto:[email protected]] On Behalf Of Geert Josten
Sent: Tuesday, February 26, 2013 9:54 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Accounting for empty element that has 
range index

Hi David,

I think I'd recommend getting rid of the empty elements, but perhaps you have 
reason to want to preserve them. In that case, changing the invalid values 
option to 'ignore' is by far the easiest. Just toggle the setting, save the 
change, and you should be able to insert documents that have empty elements. 
Otherwise they would get rejected.

I am not entirely sure about the existing documents that have an empty element. 
MarkLogic might automatically start reindexing, and solve the issue for you. 
But you should notice that soon enough if you change the setting. You should 
see a reindex running, and eventually your error message should disappear.

If not, you will have to isolate those docs, and reinsert/'touch' them.. :-/

Kind regards,
Geert

Van: 
[email protected]<mailto:[email protected]>
 
[mailto:[email protected]<mailto:[email protected]>]
 Namens Steiner, David J. (LNG-DAY)
Verzonden: dinsdag 26 februari 2013 15:18
Aan: 'MarkLogic Developer Discussion'
Onderwerp: [MarkLogic Dev General] Accounting for empty element that has range 
index

I have data that I loaded and then decided to put an element range index on.  
In some documents the element is empty.  So, I think, when I try to bucketize 
the facet in a constraint, I'm getting an error because of these empty values - 
at least that's what it seems like.  I don't get any error when I run 
"search:check-options" (with $strict set to true).  I'm hoping that there's a 
way to access this range and ignore the empty values (or include them in a 
bucket) if possible without reprocessing data.

Here's the error:
[1.0-ml] XDMP-CAST: (err:FORG0001) cts:search(fn:collection(), 
cts:and-query((cts:element-range-query(fn:QName("", "FIRM_SIZE"), ">=", 
xs:int("2"), (), 1), cts:element-range-query(fn:QName("", "FIRM_SIZE"), "<", 
xs:int("10"), (), 1)), ()), "score-logtfidf", 1) -- Invalid cast: 
xs:untypedAtomic("") cast as xs:int

Here's the constraint:
<options xmlns="http://marklogic.com/appservices/search";>
  <search:constraint name="firmSize">
    <search:range type="xs:int" facet="true">
      <search:bucket lt="2" ge="1" name="1">1</search:bucket>
      <search:bucket lt="10" ge="2" name="2through9">2-9</search:bucket>
      <search:bucket lt="20" ge="10" name="10through19">10-19</search:bucket>
      <search:bucket lt="50" ge="20" name="20through49">20-49</search:bucket>
      <search:bucket lt="100" ge="50" name="50through99">50-99</search:bucket>
      <search:bucket lt="200" ge="100" 
name="100through199">100-199</search:bucket>
      <search:bucket lt="500" ge="200" 
name="200through499">200-499</search:bucket>
      <search:bucket ge="500" name="Over500">Over500</search:bucket>
      <search:facet-option>limit=10</search:facet-option>
      <search:element ns="" name="FIRM_SIZE"/>
     <search:fragment-scope>documents</search:fragment-scope>
    </search:range>
  </search:constraint>
</options>


I guess I need a bucket that categorizes "null" (empty)?  Or will this just 
never work because I have documents that have empty elements for a range index 
element?

If I have to reprocess, can I just re-index, or do I have to actually delete 
those elements?

If I re-index, is it the setting for "invalid values" that needs to change?  I 
would've thought that if it is set to "ignore" then an empty value wouldn't 
have been included in the range.  I don't want the whole document "reject"ed, 
but maybe I just don't understand what that means (to me it seems like the 
document wouldn't be loaded; but I want the document loaded, I just don't care 
to have it counted in a facet for which that element is empty, or a way to 
count the empty ones if there's not a way to ignore them).

Thanks,
David


_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to