Hi Sue,

I've been diagnosing this issue in Luke (a Java GUI that allows you browse
your lucene index). And while digging around, it looked like the
sort_dateissued field is having trouble with certain date metadata.

In our repository, we have our date metadata values scattered all about.
1981-12-07T16:56:12Z
1981-12-07
1981-12
1981

Each one of them is a valid ISO8601 date. However, that doesn't mean each
of them is a valid date in Lucene (your search and browse index). A
metadata person might see 1981-12 as meaning some type of range or
approximation. However, when you are searching and sorting, it must be able
to sort the values precisely. So is 1981-12 before or after 1981-12-07? Is
1981-12 before, after, or equal to 1981-12-01?

I'll ask my metadata people if we can flatten our metadata for dates, and
stuff them to have a day (of the first of the month).

And I'll dig further into the DSpace reindexing code to see if when we are
processing DSpace metadata dates, that might be valid iso8601, that we
convert them to an appropriate lucene date.


Peter Dietz



On Thu, Feb 2, 2012 at 11:55 PM, Thornton, Susan M. (LARC-B702)[LITES] <
[email protected]> wrote:

>  Thanks Peter.  I've spent several hours researching this issue,
> especially why we have it in one DSpace instance and not another (running
> same versions).  Although I'm not 100% sure, I suspect the issue is caused
> by invalid data in the date.issued field(s) in the respository.  The
> solution for this, of course, would be to clean up the bad dates we have
> and then put some edits on the date fields that end up in DSpace so we do
> not allow bad dates to get IN our repository.  But again, I'm not 100% sure
> of this and I won't be able to get back to looking into this for awhile.
> Best regards,
> Sue
>
>  Sue Walker-Thornton
> Software Developer|Database Administrator
> NASA Langley Research Center
> SGT, Inc.|LITES Contract
> 130 Research Drive
> Hampton, VA  23666
> Office: (757) 864-2368|Fax: (757) 224-4001|Mobile: (757) 506-9903
> Email:  [email protected]
>  ------------------------------
> *From:* Peter Dietz [[email protected]]
> *Sent:* Wednesday, February 01, 2012 12:29 PM
> *To:* Cristian Romanescu
> *Cc:* [email protected]
> *Subject:* Re: [Dspace-tech] search can't sort by date issued
>
>  Hi All,
>
> I've just started digging into this as well. Its really unfortunate to
> only get "relevance" results for searches.
>
> In digging in, I've spit out the stack trace, and its telling me a few
> things.
> 1) Do we have "bad" metadata for dc.date.issued?
> -- (I've already harassed my content folks to have them review all our
> metadata) ;)
>
> 2) Are we doing the comparison of dates incorrectly. The error below says
> is the value of "dateissued" an INT.
> -- I've been reading this thread, which is very similar:
> http://www.gossamer-threads.com/lists/lucene/java-user/109530
>
>
>  2012-01-31 17:47:02,475 ERROR org.dspace.search.DSQuery @ Unable to use
> speficied sort option: dateissued
> 2012-01-31 17:47:02,475 ERROR org.dspace.search.DSQuery @ Invalid shift
> value in prefixCoded string (is encoded value really an INT?)
> 2012-01-31 17:47:02,476 ERROR org.dspace.search.DSQuery @
> java.lang.NumberFormatException: Invalid shift value in prefixCoded string
> (is encoded value really an INT?)
> at
> org.apache.lucene.util.NumericUtils.prefixCodedToInt(NumericUtils.java:233)
> at org.apache.lucene.search.FieldCache$7.parseInt(FieldCache.java:237)
> at
> org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCacheImpl.java:457)
> at
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:224)
> at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:430)
> at
> org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCacheImpl.java:447)
> at
> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:224)
> at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:430)
> at
> org.apache.lucene.search.FieldComparator$IntComparator.setNextReader(FieldComparator.java:332)
> at
> org.apache.lucene.search.TopFieldCollector$MultiComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:435)
> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:249)
> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:240)
> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:181)
> at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:113)
> at org.apache.lucene.search.Hits.<init>(Hits.java:90)
> at org.apache.lucene.search.Searcher.search(Searcher.java:63)
> at org.dspace.search.DSQuery.doQuery(DSQuery.java:151)
> at org.dspace.search.DSQuery.doQuery(DSQuery.java:309)
> at
> org.dspace.app.xmlui.aspect.artifactbrowser.AbstractSearch.performSearch(AbstractSearch.java:438)
>
>
> Just for fun, I enabled Discovery on our development machines, and sorting
> by date issued works perfectly in a search. So, a quick-fix would be to
> switch to using discovery. But, none-the-less, I look forward to getting a
> resolution to this issue.
>
>
> Peter Dietz
>
>
>
> On Wed, Feb 1, 2012 at 7:15 AM, Cristian Romanescu <
> [email protected]> wrote:
>
>> Greetings,
>>
>> Have you tried to look into the lucene indexes with Luke tool?
>> (http://www.getopt.org/luke/).
>> We are using:
>>      search.index.13 = dc_date:dc.date.issued:date
>> to filter by time interval and it works.
>>
>> But first, we had to remove the old indexes and re-create them to have
>> correct indexing (ie rm -rf $builddir/search and run
>> ./$builddir/bin/dspace index-init). It only worked when data inside
>> index looks like 201201010000 ... - when you look with luke tool
>>
>> HTH,
>> Cristian
>>
>>
>> On 02/01/2012 12:46 PM, Päivi Rosenström wrote:
>> > Any solution for this found yet ?
>> >
>> >
>> > Thanks!
>> >
>> > Päivi
>> >
>> >
>> >> Re: [Dspace-tech] search can't sort by date issued
>> >> From: James Bardin<jbardin@bu...>  - 2011-10-27 19:23
>> >> On Thu, Oct 27, 2011 at 1:52 PM, Blanco, Jose<blancoj@...>  wrote:
>> >>> # Browse indexes
>> >>> webui.browse.index.1 = title:item:title
>> >>> webui.browse.index.2 = author:metadata:dc.contributor.author:text
>> >>> webui.browse.index.3 = subject:metadata:dc.subject.*:text
>> >>> webui.browse.index.4 = dateissued:item:dateissued
>> >>>
>> >>> # Sorting options
>> >>> webui.itemlist.sort-option.1 = title:dc.title:title
>> >>> webui.itemlist.sort-option.2 = dateissued:dc.date.issued:date
>> >>> webui.itemlist.sort-option.3 =
>> dateaccessioned:dc.date.accessioned:date
>> >>>
>> >> Yeah, I have dateissued in both the browse.index and sort-option, like
>> above.
>> >> Sorting by dateissued *does* work in browsing, but not for search
>> >> results (I think search result ordering is done by lucene, and not the
>> >> webui). I took a guess and added another search index for
>> >> dateissued:dc.date.issued:date, but that doesn't seem to have any
>> >> effect.
>> >
>> >> -jim
>> >
>> >
>> >
>> ------------------------------------------------------------------------------
>> > Keep Your Developer Skills Current with LearnDevNow!
>> > The most comprehensive online learning library for Microsoft developers
>> > is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>> > Metro Style Apps, more. Free future releases when you subscribe now!
>> > http://p.sf.net/sfu/learndevnow-d2d
>> > _______________________________________________
>> > DSpace-tech mailing list
>> > [email protected]
>> > https://lists.sourceforge.net/lists/listinfo/dspace-tech
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Keep Your Developer Skills Current with LearnDevNow!
>> The most comprehensive online learning library for Microsoft developers
>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>> Metro Style Apps, more. Free future releases when you subscribe now!
>> http://p.sf.net/sfu/learndevnow-d2d
>> _______________________________________________
>> DSpace-tech mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>>
>
>
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to