Tim Allison created TIKA-4033:
---------------------------------
Summary: Improve metadata for incremental updates, take 2
Key: TIKA-4033
URL: https://issues.apache.org/jira/browse/TIKA-4033
Project: Tika
Issue Type: Task
Reporter: Tim Allison
We're currently generating a "resourceName" in the PDFParser for incremental
updates. The following isn't well documented (I don't think?), but we try to
reserve "resourceName" for embedded files to be the actual name that the
container document has for that embedded file.
Now, we need some kind of name for the embedded resource path in
RecursiveParserWrapper, so we generate something based on the resourceName or,
if that doesn't exist, the the relationship id, and if that doesn't exist we
create /embedded-NUM.
But that's a separate issue.
We should use another option so that RecursiveParserWrapper knows to name the
path /version-number-0 or similar. We should not misuse "resourceName".
--
This message was sent by Atlassian Jira
(v8.20.10#820010)