[ https://issues.apache.org/jira/browse/TIKA-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nicholas DiPiazza updated TIKA-3125: ------------------------------------ Description: Using the attached docx file, when I parse it with {{/unpack}} Endpoint I get {{__TEXT__}} file that contains my this: {code:java} [[bookmark: _GoBack]Launching ms word Sadfsadfsaf Asdfsafsafasfsafd Asdf2 Asfd3 asfd {code} But when I parse it with {{/rmeta/text}} I get a {{X-TIKA:content}} field that contains: {code:java} Launching ms word Sadfsadfsaf Asdfsafsafasfsafd Asdf2 Asfd3 asfd {code} Why do these differ? Seems like there a bunch of leading \n characters to start out on the {{/rmeta/text}} endpoint? And there is this strange {{[[bookmark: _GoBack]}} that I wasn't expecting too. Not sure what that means. Perhaps they are just fundamentally different outputs and this is normal behavior? was: Using the attached docx file, when I parse it with {{/unpack}} Endpoint I get {{__TEXT__}} file that contains my this: {code:java} [[bookmark: _GoBack]Launching ms word Sadfsadfsaf Asdfsafsafasfsafd Asdf2 Asfd3 asfd {code} But when I parse it with {{/rmeta/text}} I get a {{X-TIKA:content}} field that contains: {code:java} Launching ms word Sadfsadfsaf Asdfsafsafasfsafd Asdf2 Asfd3 asfd {code} Why do these differ? Seems like there a bunch of leading \n characters to start out on the {{/rmeta/text}} endpoint? > rmeta/text and unpack - the __TEXT__ file and X-TIKA:content differ by some > leading new line characters > ------------------------------------------------------------------------------------------------------- > > Key: TIKA-3125 > URL: https://issues.apache.org/jira/browse/TIKA-3125 > Project: Tika > Issue Type: Bug > Reporter: Nicholas DiPiazza > Priority: Major > Attachments: test-ooxml.docx > > > Using the attached docx file, when I parse it with > {{/unpack}} > Endpoint I get {{__TEXT__}} file that contains my this: > {code:java} > [[bookmark: _GoBack]Launching ms word > Sadfsadfsaf > Asdfsafsafasfsafd > Asdf2 > Asfd3 > asfd > {code} > But when I parse it with {{/rmeta/text}} I get a {{X-TIKA:content}} field > that contains: > {code:java} > Launching ms word > Sadfsadfsaf > Asdfsafsafasfsafd > Asdf2 > Asfd3 > asfd > {code} > Why do these differ? Seems like there a bunch of leading \n characters to > start out on the {{/rmeta/text}} endpoint? And there is this strange > {{[[bookmark: _GoBack]}} that I wasn't expecting too. Not sure what that > means. Perhaps they are just fundamentally different outputs and this is > normal behavior? -- This message was sent by Atlassian Jira (v8.3.4#803005)