unsubscribe On Thu, Oct 11, 2018 at 12:49 PM Hudson (JIRA) <[email protected]> wrote:
> > [ > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_TIKA-2D2735-3Fpage-3Dcom.atlassian.jira.plugin.system.issuetabpanels-3Acomment-2Dtabpanel-26focusedCommentId-3D16646969-23comment-2D16646969&d=DwIFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=hWcASyFQmOiqKtRZsobP0w&m=QX4MmHkznfkIhOlAAvMpMpH1-Klfpw0on3kIvGF-NOw&s=AwS5sC4rfobH6ZIR6xweVrD0Tn_-DNyCi7gZaV3dDFM&e= > ] > > Hudson commented on TIKA-2735: > ------------------------------ > > FAILURE: Integrated in Jenkins build tika-branch-1x #113 (See [ > https://urldefense.proofpoint.com/v2/url?u=https-3A__builds.apache.org_job_tika-2Dbranch-2D1x_113_&d=DwIFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=hWcASyFQmOiqKtRZsobP0w&m=QX4MmHkznfkIhOlAAvMpMpH1-Klfpw0on3kIvGF-NOw&s=YGO9-ykotYFQaBOLGtOXkZNSmmPzYQNBJMll0DBuMIQ&e= > ]) > TIKA-2735 -- allow user to avoid extracting "master" sections and notes > (tallison: [ > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_tika_commit_307a8bd592d6e25419bbad19aac47cc7de201c4d&d=DwIFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=hWcASyFQmOiqKtRZsobP0w&m=QX4MmHkznfkIhOlAAvMpMpH1-Klfpw0on3kIvGF-NOw&s=ml87qxUhpeY6vmA_VfyJKvP_PjaXhxwqsPN0jJE5b_U&e= > ]) > * (edit) > tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java > * (edit) > tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSLFPowerPointExtractorDecorator.java > * (edit) > tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java > * (edit) > tika-parsers/src/test/java/org/apache/tika/parser/microsoft/PowerPointParserTest.java > * (edit) > tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OfficeParserConfig.java > * (edit) > tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/SXSLFPowerPointExtractorDecorator.java > * (edit) > tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/SXSLFExtractorTest.java > > > > notes and footer contents are duplicated in extracting text from power > point slides > > > ----------------------------------------------------------------------------------- > > > > Key: TIKA-2735 > > URL: > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_TIKA-2D2735&d=DwIFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=hWcASyFQmOiqKtRZsobP0w&m=QX4MmHkznfkIhOlAAvMpMpH1-Klfpw0on3kIvGF-NOw&s=tWgXQDsRm26dLawXmBaknk92SsTf8g-42yM2VHKyiiI&e= > > Project: Tika > > Issue Type: Bug > > Components: handler > > Affects Versions: 1.18 > > Reporter: feng ye > > Priority: Major > > Attachments: Oneslide.ppt, pptTextResults.txt > > > > > > notes and footer contents are duplicated at the end when extract text > from ppt slides (like the one in the attachment). Both the input file and > the text results are attached. > > Is there a configuration option that can be used to suppress this kind > of duplication? > > > > -- > This message was sent by Atlassian JIRA > (v7.6.3#76005) > -- Warm Regards Anubha Balani
