[ 
https://issues.apache.org/jira/browse/TIKA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-4171.
-------------------------------
    Fix Version/s: 2.9.2
                   3.0.0-BETA
       Resolution: Fixed

Thank you for noticing this problem and opening this issue, [~cssndrx]!

> Tika server only returns last value for PDFs that have multiple of the same 
> key
> -------------------------------------------------------------------------------
>
>                 Key: TIKA-4171
>                 URL: https://issues.apache.org/jira/browse/TIKA-4171
>             Project: Tika
>          Issue Type: Bug
>          Components: tika-server
>            Reporter: Cassandra Xia
>            Priority: Major
>             Fix For: 2.9.2, 3.0.0-BETA
>
>         Attachments: 20230801-5207_QF20-270 East River Solar Form 556 recert 
> FINAL.pdf, example-output.txt, screenshot.png
>
>
> Thanks for the great work on Tika server, it is the only OSS that can handle 
> Adobe's protected form format that FERC uses. 
> One problem that I'm hitting is that the FERC form that I am parsing has 
> multiple values for the same key name, e.g. in the screenshot below line 1-7 
> all have the same key name. When Tika Server parses this PDF, it only returns 
> the value in row 7 (losing the previous 6 values).
> My hunch is that somewhere in Tika Server, the values are getting stored in 
> some dictionary object, so the final value is the only survivor. Would it be 
> possible to return the extra values as a list from Tika Server? 
> Example PDF attached - thank you for taking a look!
> !https://mail.google.com/mail/u/0?ui=2&ik=ee87dc4bd1&attid=0.0.7&permmsgid=msg-f:1782641700487887488&th=18bd372e8760fa80&view=fimg&fur=ip&sz=s0-l75-ft&attbid=ANGjdJ9qEkw6kZ9yBDfMBOUuvFB1Tk8Pti0rRvReEq-eWUoJQxLA6rZ0TQvWCsKUySaDPjjrSi-IiyKseDYpFGzF44A3iSaFw9sOanoBdFMNEZciDnaGhsUFvLSIH_0&disp=emb&realattid=ii_lmdun7ff6!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to