[ https://issues.apache.org/jira/browse/TIKA-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nicholas DiPiazza updated TIKA-3125: ------------------------------------ Description: Using the attached docx file, when I parse it with {{/unpack}} Endpoint I get {{__TEXT__}} file that contains my this: {code:java} Sadfsadfsaf Asdfsafsafasfsafd Asdf2 Asfd3 asfd {code} But when I parse it with {{/rmeta/text}} I get: {code:java} Launching ms word Sadfsadfsaf Asdfsafsafasfsafd Asdf2 Asfd3 asfd {code} Why are there a bunch of leading \n characters to start out on the {{/rmeta/text}} endpoint? was: Using the attached docx file, when I parse it with {{/unpack}} Endpoint I get {{__DATA__}} file that contains my this: {code:java} Sadfsadfsaf Asdfsafsafasfsafd Asdf2 Asfd3 asfd {code} But when I parse it with {{/rmeta/text}} I get: {code:java} Launching ms word Sadfsadfsaf Asdfsafsafasfsafd Asdf2 Asfd3 asfd {code} Why are there a bunch of leading \n characters to start out on the {{/rmeta/text}} endpoint? > rmeta/text and unpack - the __TEXT__ file and X-TIKA:content differ by some > leading new line characters > ------------------------------------------------------------------------------------------------------- > > Key: TIKA-3125 > URL: https://issues.apache.org/jira/browse/TIKA-3125 > Project: Tika > Issue Type: Bug > Reporter: Nicholas DiPiazza > Priority: Major > Attachments: test-ooxml.docx > > > Using the attached docx file, when I parse it with > {{/unpack}} > Endpoint I get {{__TEXT__}} file that contains my this: > {code:java} > Sadfsadfsaf > Asdfsafsafasfsafd > Asdf2 > Asfd3 > asfd > {code} > But when I parse it with {{/rmeta/text}} I get: > {code:java} > Launching ms word > Sadfsadfsaf > Asdfsafsafasfsafd > Asdf2 > Asfd3 > asfd > {code} > Why are there a bunch of leading \n characters to start out on the > {{/rmeta/text}} endpoint? -- This message was sent by Atlassian Jira (v8.3.4#803005)