[ 
https://issues.apache.org/jira/browse/OODT-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558323#comment-13558323
 ] 

Chris A. Mattmann commented on OODT-551:
----------------------------------------

Hey BFost -- first off, yes, I agree with you. Long have I preached to you and 
others that the Metadata key-values structure implies no ordering of the 
values. However, a real side effect for years has been that the Lucene catalog 
on persistence has maintained such an order (it still does -- the keys are 
unordered b/c it used to be a hash map, but the values in it have always been 
ordered). This is an artifact of the way that Lucene stores/persists fields. 

That being said the DataSourceCatalog has never preserved these semantics. It's 
always, as you've said, been whatever order the values were inserted into it. 
Since values for Metadata prior to the switch over to the Metadata group style 
(and away from the HashMap) were ordered, prior to that switch over, entering 
those values into the DataSourceCatalog made them unordered. 

Luca and I noticed this on a JPL project ("VFASTR" a radio astronomy project) 
where we did the classic mentality of starting out with Lucene; waiting until 
it doesn't scale anymore; then moving onto the DataSourceCatalog and a DB. When 
doing so, a bunch of downstream code for VFASTR broke b/c all along we had made 
the (incorrect) assumption that the values were ordered b/c that's the behavior 
we were seeing with the LuceneCatalog. Since I'm not coding too much on that 
project, and since Luca is, and since Luca didn't have the history, he and 
Andrew and others wrote all that code assuming that all would be well, and then 
when we switched to the DataSoureCatalog, all hell broke loose heh ;)

So, Luca had done some testing and tried to come up with something in 
VFASTR-land that would work and that was the "fix"/"updates" to the 
DataSourceCatalog. I encouraged him to not just keep it in JPL project ville, 
but to bring it up to Apache and contribute it back. I think there wasn't 
enough time to discuss that contribution which is why I rolled the rev back and 
opened it up for discussion like we're having here.

So my proposal was to:

# introduce a property into DataSourceCatalog for ordering metadata fields. By 
default, it's turned off (to preserve the prior behavior of not maintaining 
that ordering). It can be turned on, to introduce a 3-4 line functionality 
patch to add a ORDER BY statement to the SQL returning met values, and to 
assume that the person deploying OODT has already installed a simple schema 
update to handle that pkey.
# make sure this is also unit tested

What do you think of that? Does that sound OK? Thanks for your feedback.

                
> DataSourceCatalog implementation does not preserve order of metadata values
> ---------------------------------------------------------------------------
>
>                 Key: OODT-551
>                 URL: https://issues.apache.org/jira/browse/OODT-551
>             Project: OODT
>          Issue Type: Bug
>          Components: file manager
>    Affects Versions: 0.5
>            Reporter: Luca Cinquini
>            Assignee: Luca Cinquini
>             Fix For: 0.6
>
>         Attachments: OODT-551.luca.patch.txt
>
>
> The table that stores the metadata (key, value) pairs for the File Manager 
> database-based implementation has no primary key - as a consequence, values 
> are not guaranteed to be returned in any order, which is a problem for 
> applications that rely on the order of the values (for example, among 
> different metadata keys).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to