[
https://issues.apache.org/jira/browse/OODT-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147207#comment-14147207
]
Lewis John McGibbney edited comment on OODT-754 at 9/25/14 1:11 AM:
--------------------------------------------------------------------
[~rickdn] this is an excellent idea. [~skhudiky] and myself were discussing
this today and it is certainly a shortcoming of other extractor implementations
where they do not account for the following case
Say you have a file which is as follows AAAA-BB-CCCCCC-DD.png which you wish to
consider as a product.
* AAAA represents the instrument/device which produced the picture
* BB is an identifier for the project the picture was produced for
* CCCCCC is the datee.g. YYMMDD
* DD is the number of products produced on that date for that project by that
instrument.
What happens is DD > 99?
Well what happens is that the FileNameExtractor (or whatever it is called)
policy is broken and we begin ingesting incorrect information.
The extractor you describe on the wiki makes life so much easier to deal with
cases like the above.
Thanks
was (Author: lewismc):
[~rickdn] this is an excellent idea. [~skhudiky] and myself were discussing
this today and it is certainly a shortcoming of other extractor implementations
where they do not account for the following case
Say you have a file which is as follows AAAA-BB-CCCCCC-DD.png which you wish to
consider as a product.
* AAAA represents the instrument/device which produced the picture
* BB is an identifier for the project the picture was produced for
* CCCCCC is the datee.g. YYMMDD
* DD is the number of products produced on that date for that project by that
instrument.
What happens is DD > 99?
Well what happens is that the FileNameExtractor (or whatever it is called)
policy is broken and we begin ingesting incorrect information.
The extractor you describe on the wiki makes life so much easier to deal with
cases like the above.
Thanks
> contribute ProdTypePatternMetExtractor
> --------------------------------------
>
> Key: OODT-754
> URL: https://issues.apache.org/jira/browse/OODT-754
> Project: OODT
> Issue Type: New Feature
> Components: metadata container
> Reporter: Ricky Nguyen
> Assignee: Ricky Nguyen
> Fix For: 0.8
>
>
> There has been renewed interest in implementing the
> ProdTypePatternMetExtractor proposed
> [here|https://cwiki.apache.org/confluence/display/OODT/MetExtractors+for+Crawler].
> I was going to add it to the "metadata" module under the
> "org.apache.oodt.cas.metadata.extractors" package.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)