[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-1575:
------------------------------
Attachment: 10-814_Appendix B_v3.pdf
Form clutter...This was embedded inside 776568.
With PDFBox 1.8.8, we extracted the keys for the subform (but there was no
meaningful content in this doc):
{noformat}Briefings\n\nNo\n\n NWSI 10-814 November 10, 2008\n\n
19\n\n\n\tform1[0]: \n\t#subform[0]: \n\tPrintButton1[0]: \n\tCheckBox1[0]:
\n\tCheckBox2[0]: \n\tTextField1[0]: \n\tCheckBox5[0]: \n\tCheckBox6[0]:
\n\tTextField2[0]: \n\tTextField3[0]: \n\tCheckBox9[0]: \n\tCheckBox10[0]:
\n\tCheckBox11[0]: \n\tCheckBox12[0]: \n\tCheckBox11[1]: \n\tCheckBox12[1]:
\n\tCheckBox11[2]: \n\tCheckBox12[2]: \n\tTextField4[0]: \n\tTextField2[1]:
\n\tTextField9[0]: \n\n\t#subform[1]: \n\tCheckBox1[1]: \n\tCheckBox2[1]:
\n\tTextField1[1]: \n\tCheckBox5[1]: \n\tCheckBox6[1]: \n\tCheckBox9[1]:
\n\tCheckBox10[1]: \n\tCheckBox11[3]: \n\tCheckBox12[3]: \n\tCheckBox11[4]:
\n\tCheckBox12[4]: \n\tTextField4[1]: \n\tTextField5[0]: \n\tCheckBox5[2]:
\n\tCheckBox6[2]: \n\n\t#subform[2]: \n\tCheckBox1[2]: \n\tCheckBox2[2]:
\n\tCheckBox9[2]: \n\tCheckBox10[2]: \n\tTextField4[2]: \n\tCheckBox5[3]:
\n\tCheckBox6[3]: \n\tCheckBox1[3]: \n\tCheckBox2[3]: \n\tCheckBox5[4]:
\n\tCheckBox6[4]: \n\tCheckBox9[3]: \n\tCheckBox10[3]: \n\tTextField4[3]:
\n\tCheckBox9[4]: \n\tCheckBox10[4]: \n\tTextField6[0]: \n\tTextField7[0]:
\n\tCheckBox9[5]: \n\tCheckBox10[5]: \n\tTextField6[1]: \n\tTextField6[2]:
\n\tTextField8[0]: \n\tTextField8[1]: \n\n\t#subform[3]: \n\tCheckBox1[4]:
\n\tCheckBox2[4]: \n\tCheckBox5[5]: \n\tCheckBox6[5]: \n\tCheckBox9[6]:
\n\tCheckBox10[6]: \n\tTextField4[4]: \n\tCheckBox5[6]: \n\tCheckBox6[6]:
\n\tCheckBox1[5]: \n\tCheckBox2[5]: \n\tCheckBox5[7]: \n\tCheckBox6[7]:
\n\tCheckBox5[8]: \n\tCheckBox5[9]: \n\tCheckBox6[8]: \n\tCheckBox6[9]:
\n\tTextField8[2]: \n\tCheckBox9[7]: \n\tCheckBox10[7]: \n\tTextField6[3]:
\n\tTextField6[4]: \n\tCheckBox5[10]: \n\tCheckBox5[11]: \n\tCheckBox6[10]:
\n\tCheckBox6[11]: \n\n\n\n\n",{noformat}
In 1.8.9, there's just this:
{noformat}
Briefings\n\nNo\n\n NWSI 10-814 November 10, 2008\n\n 19\n\n\n\tform1[0]:
\n\n\n\n
{noformat}
> Upgrade to PDFBox 1.8.9 when available
> --------------------------------------
>
> Key: TIKA-1575
> URL: https://issues.apache.org/jira/browse/TIKA-1575
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Priority: Minor
> Attachments: 10-814_Appendix B_v3.pdf,
> PDFBox_1_8_8VPDFBox_1_8_9-SNAPSHOT.xlsx,
> PDFBox_1_8_8VPDFBox_1_8_9-SNAPSHOT_reports.zip
>
>
> The PDFBox community is about to release 1.8.9. Let's use this issue to
> track discussions before the release and to track Tika's upgrade to PDFBox
> 1.8.9
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)