tballison commented on a change in pull request #520: URL: https://github.com/apache/tika/pull/520#discussion_r818943762
########## File path: tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml ########## @@ -6422,7 +6422,7 @@ <!-- match X- DKIM- ARC- at start of file and then require at least one of the usual: from, received, date...but look farther into the file because of the X|DKIM|ARC headers--> - <match value="(X|DKIM|ARC)-" type="regex" offset="0"> + <match value="(X|DKIM|ARC)-" type="regex" offset="0:1024"> Review comment: I worry about looking for X- anywhere in the first 1024 without requiring a \n before it. What would you think of adding something like this into the previous minShouldMatch=2 clause? ` <match value="\nX-" type="string" offset="0:1024"> <match value="\nDKIM-" type="string" offset="0:1024"> <match value="\nARC-" type="string" offset="0:1024"> ` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org