[
https://issues.apache.org/jira/browse/TIKA-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011258#comment-14011258
]
Hudson commented on TIKA-1291:
------------------------------
SUCCESS: Integrated in tika-trunk-jdk1.6 #7 (See
[https://builds.apache.org/job/tika-trunk-jdk1.6/7/])
TIKA-1291/TIKA-1310 fix bug in JSON output from CLI (tallison:
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1598023)
* /tika/trunk/CHANGES.txt
* /tika/trunk/tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java
* /tika/trunk/tika-app/src/main/java/org/apache/tika/io
* /tika/trunk/tika-app/src/main/java/org/apache/tika/io/json
*
/tika/trunk/tika-app/src/main/java/org/apache/tika/io/json/JsonMetadataSerializer.java
* /tika/trunk/tika-app/src/test/java/org/apache/tika/cli/TikaCLITest.java
* /tika/trunk/tika-app/src/test/resources/test-data/testJsonMultipleInts.html
> Invalid JSON output on CLI
> --------------------------
>
> Key: TIKA-1291
> URL: https://issues.apache.org/jira/browse/TIKA-1291
> Project: Tika
> Issue Type: Bug
> Components: cli, metadata
> Affects Versions: 1.4, 1.5
> Reporter: Steffen
> Assignee: Tim Allison
> Fix For: 1.6
>
>
> Getting the metadata via CLI from tika with output format set to JSON gives
> sometimes invalid JSON. I only found float/array errors here in jira and thus
> created this ticket with a new case.
> In my case the file that lead to invalid JSON output was a PNG file (that I
> unfortunately can't provide for testing):
> {noformat}
> { "Application Record Version":4,
> "Component 1":"Y component: Quantization table 0, Sampling factors 2 horiz/2
> vert",
> "Component 2":"Cb component: Quantization table 1, Sampling factors 1 horiz/1
> vert",
> "Component 3":"Cr component: Quantization table 1, Sampling factors 1 horiz/1
> vert",
> "Compression Type":"Baseline",
> "Content-Length":113081,
> "Content-Type":"image/jpeg",
> "Data Precision":"8 bits",
> "IPTC-NAA record":"24 bytes binary data",
> "Image Height":"479 pixels",
> "Image Width":"671 pixels",
> "Number of Components":3,
> "Resolution Units":"inch",
> "Unknown tag (0x02f0)":35,0,556,479,
> "X Resolution":"220 dots",
> "Y Resolution":"220 dots",
> "resourceName":18,
> "tiff:BitsPerSample":8,
> "tiff:ImageLength":479,
> "tiff:ImageWidth":671 }
> {noformat}
> The {noformat}"Unknown tag (0x02f0)":35,0,556,479, {noformat} is invalid JSON.
> It would be nice if there's always valid json output from tika. For other
> cases that might not be catched via fixes by this ticket it would be nice to
> have a CLI argument/option that disables the output of certain (unknown?)
> fields or allows giving a whitelist of fieldnames to output. That way users
> can bridge the time until new releases of tika by being more specific on the
> shell. If that feature already exists I apology for not having found it
> directly and a hint to the CLI option would be nice.
--
This message was sent by Atlassian JIRA
(v6.2#6252)