[ 
https://issues.apache.org/jira/browse/BEAM-618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15491845#comment-15491845
 ] 

ASF GitHub Bot commented on BEAM-618:
-------------------------------------

Github user ajamato closed the pull request at:

    https://github.com/apache/incubator-beam/pull/947


> Python SDKs writes non RFC compliant JSON files for BQ Export
> -------------------------------------------------------------
>
>                 Key: BEAM-618
>                 URL: https://issues.apache.org/jira/browse/BEAM-618
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py
>            Reporter: Alex Amato
>            Assignee: Frances Perry
>
> Python SDK uses the built in json.dumps to write JSON files to GCS for the BQ 
> Exporter. BigQuery can fail to parse these files when it tries to load these 
> files into a BQ table because json.dumps can export JSON which does not 
> conform to the IEEE RFC.
> There are a few cases which are not RFC compilant listed in that module.
> https://docs.python.org/2/library/json.html#standard-compliance-and-interoperability
> The main issue we run into is the NAN, INF and -INF values.
> These fails with a confusing error (and we delete the GCS files making it 
> hard to debug):
> JSON table encountered too many errors, giving up. Rows JSON parsing error in 
> row starting at position
> We can set the allow_nan argument to json.dumps to false to address these 
> issues. So that when a user tries to write a file with INF, -INF or NAN
> Setting this argument will produce this type of error when json.dumps is 
> called with NAN/INF values. We may want to catch this error to mention the 
> fact that INF and NAN are not allowed.
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/lib/python2.7/json/__init__.py", line 250, in dumps
>     sort_keys=sort_keys, **kw).encode(obj)
>   File "/usr/lib/python2.7/json/encoder.py", line 207, in encode
>     chunks = self.iterencode(o, _one_shot=True)
>   File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode
>     return _iterencode(o, 0)
> ValueError: Out of range float values are not JSON compliant



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to