[jira] [Commented] (TIKA-1483) Create a general raw string parser

Chris A. Mattmann (JIRA) Tue, 24 Feb 2015 22:51:21 -0800

    [ 
https://issues.apache.org/jira/browse/TIKA-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14336107#comment-14336107
 ]


Chris A. Mattmann commented on TIKA-1483:
-----------------------------------------

[~lfcnassif] I tried to apply this patch and am getting this:

{noformat}
[chipotle:~/tmp/tika] mattmann% patch -p1 < TIKA-1483_v2.patch
patching file 
tika-parsers/src/main/java/org/apache/tika/parser/strings/Latin1StringsParser.java
patching file 
tika-parsers/src/test/java/org/apache/tika/parser/strings/Latin1StringsParserTest.java
patch unexpectedly ends in middle of line
patch: **** malformed patch at line 403:  

[chipotle:~/tmp/tika] mattmann% 

{noformat}

Any ideas?


> Create a general raw string parser
> ----------------------------------
>
>                 Key: TIKA-1483
>                 URL: https://issues.apache.org/jira/browse/TIKA-1483
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 1.6
>            Reporter: Luis Filipe Nassif
>         Attachments: TIKA-1483.patch, TIKA-1483_v2.patch
>
>
> I think it can be very useful adding a general parser able to extract raw 
> strings from files (like the strings command), which can be used as the 
> fallback parser for all mimetypes not having a specific parser 
> implementation, like application/octet-stream. It can also be used as a 
> fallback for corrupt files throwing a TikaException.
> It must be configured with the script/language to be extracted from the files 
> (currently I implemented one specific for Latin1).
> It can use heuristics to extract strings encoded with different charsets 
> within the same file, mainly the common ISO-8859-1, UTF8 and UTF16.
> What the community thinks about that?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TIKA-1483) Create a general raw string parser

Reply via email to