[
https://issues.apache.org/jira/browse/OODT-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558323#comment-13558323
]
Chris A. Mattmann commented on OODT-551:
----------------------------------------
Hey BFost -- first off, yes, I agree with you. Long have I preached to you and
others that the Metadata key-values structure implies no ordering of the
values. However, a real side effect for years has been that the Lucene catalog
on persistence has maintained such an order (it still does -- the keys are
unordered b/c it used to be a hash map, but the values in it have always been
ordered). This is an artifact of the way that Lucene stores/persists fields.
That being said the DataSourceCatalog has never preserved these semantics. It's
always, as you've said, been whatever order the values were inserted into it.
Since values for Metadata prior to the switch over to the Metadata group style
(and away from the HashMap) were ordered, prior to that switch over, entering
those values into the DataSourceCatalog made them unordered.
Luca and I noticed this on a JPL project ("VFASTR" a radio astronomy project)
where we did the classic mentality of starting out with Lucene; waiting until
it doesn't scale anymore; then moving onto the DataSourceCatalog and a DB. When
doing so, a bunch of downstream code for VFASTR broke b/c all along we had made
the (incorrect) assumption that the values were ordered b/c that's the behavior
we were seeing with the LuceneCatalog. Since I'm not coding too much on that
project, and since Luca is, and since Luca didn't have the history, he and
Andrew and others wrote all that code assuming that all would be well, and then
when we switched to the DataSoureCatalog, all hell broke loose heh ;)
So, Luca had done some testing and tried to come up with something in
VFASTR-land that would work and that was the "fix"/"updates" to the
DataSourceCatalog. I encouraged him to not just keep it in JPL project ville,
but to bring it up to Apache and contribute it back. I think there wasn't
enough time to discuss that contribution which is why I rolled the rev back and
opened it up for discussion like we're having here.
So my proposal was to:
# introduce a property into DataSourceCatalog for ordering metadata fields. By
default, it's turned off (to preserve the prior behavior of not maintaining
that ordering). It can be turned on, to introduce a 3-4 line functionality
patch to add a ORDER BY statement to the SQL returning met values, and to
assume that the person deploying OODT has already installed a simple schema
update to handle that pkey.
# make sure this is also unit tested
What do you think of that? Does that sound OK? Thanks for your feedback.
> DataSourceCatalog implementation does not preserve order of metadata values
> ---------------------------------------------------------------------------
>
> Key: OODT-551
> URL: https://issues.apache.org/jira/browse/OODT-551
> Project: OODT
> Issue Type: Bug
> Components: file manager
> Affects Versions: 0.5
> Reporter: Luca Cinquini
> Assignee: Luca Cinquini
> Fix For: 0.6
>
> Attachments: OODT-551.luca.patch.txt
>
>
> The table that stores the metadata (key, value) pairs for the File Manager
> database-based implementation has no primary key - as a consequence, values
> are not guaranteed to be returned in any order, which is a problem for
> applications that rely on the order of the values (for example, among
> different metadata keys).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira