[ 
https://issues.apache.org/jira/browse/TIKA-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011246#comment-14011246
 ] 

Hudson commented on TIKA-1291:
------------------------------

SUCCESS: Integrated in tika-trunk-jdk1.7 #7 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/7/])
TIKA-1291/TIKA-1310 fix bug in JSON output from CLI (tallison: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1598023)
* /tika/trunk/CHANGES.txt
* /tika/trunk/tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java
* /tika/trunk/tika-app/src/main/java/org/apache/tika/io
* /tika/trunk/tika-app/src/main/java/org/apache/tika/io/json
* 
/tika/trunk/tika-app/src/main/java/org/apache/tika/io/json/JsonMetadataSerializer.java
* /tika/trunk/tika-app/src/test/java/org/apache/tika/cli/TikaCLITest.java
* /tika/trunk/tika-app/src/test/resources/test-data/testJsonMultipleInts.html


> Invalid JSON output on CLI
> --------------------------
>
>                 Key: TIKA-1291
>                 URL: https://issues.apache.org/jira/browse/TIKA-1291
>             Project: Tika
>          Issue Type: Bug
>          Components: cli, metadata
>    Affects Versions: 1.4, 1.5
>            Reporter: Steffen
>            Assignee: Tim Allison
>             Fix For: 1.6
>
>
> Getting the metadata via CLI from tika with output format set to JSON gives 
> sometimes invalid JSON. I only found float/array errors here in jira and thus 
> created this ticket with a new case.
> In my case the file that lead to invalid JSON output was a PNG file (that I 
> unfortunately can't provide for testing):
> {noformat}
> { "Application Record Version":4, 
> "Component 1":"Y component: Quantization table 0, Sampling factors 2 horiz/2 
> vert", 
> "Component 2":"Cb component: Quantization table 1, Sampling factors 1 horiz/1 
> vert", 
> "Component 3":"Cr component: Quantization table 1, Sampling factors 1 horiz/1 
> vert", 
> "Compression Type":"Baseline", 
> "Content-Length":113081, 
> "Content-Type":"image/jpeg", 
> "Data Precision":"8 bits", 
> "IPTC-NAA record":"24 bytes binary data", 
> "Image Height":"479 pixels", 
> "Image Width":"671 pixels", 
> "Number of Components":3, 
> "Resolution Units":"inch", 
> "Unknown tag (0x02f0)":35,0,556,479, 
> "X Resolution":"220 dots", 
> "Y Resolution":"220 dots", 
> "resourceName":18, 
> "tiff:BitsPerSample":8, 
> "tiff:ImageLength":479, 
> "tiff:ImageWidth":671 }
> {noformat}
> The {noformat}"Unknown tag (0x02f0)":35,0,556,479, {noformat} is invalid JSON.
> It would be nice if there's always valid json output from tika. For other 
> cases that might not be catched via fixes by this ticket it would be nice to 
> have a CLI argument/option that disables the output of certain (unknown?) 
> fields or allows giving a whitelist of fieldnames to output. That way users 
> can bridge the time until new releases of tika by being more specific on the 
> shell. If that feature already exists I apology for not having found it 
> directly and a hint to the CLI option would be nice.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to