[ https://issues.apache.org/jira/browse/BEAM-618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15491846#comment-15491846 ]
ASF GitHub Bot commented on BEAM-618: ------------------------------------- GitHub user ajamato reopened a pull request: https://github.com/apache/incubator-beam/pull/947 [BEAM-618] Disallow NAN, INF and -INF invalid JSON values in bigquery exporter Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [x] Make sure the PR title is formatted like: `[BEAM-<Jira issue #>] Description of pull request` - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [x] Replace `<Jira issue #>` in the title with the actual Jira issue number, if there is one. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- Now exporting JSON will fail with invalid NAN, INF or -INF values. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajamato/incubator-beam py_json Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/947.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #947 ---- commit 442bed71e68524368408573ce0bcb22901d7f861 Author: Alex Amato <ajam...@ajamato2016.sea.corp.google.com> Date: 2016-09-09T00:57:28Z Set allow_nan=False on bigquery JSON encoding commit bd7f920faf41a3a203d854de626c9dcd90a90e29 Author: Alex Amato <ajam...@ajamato2016.sea.corp.google.com> Date: 2016-09-12T20:09:34Z Remove unused line commit 441b59bdf2de88cae119f0399bd38f64ecbbf96f Author: Alex Amato <ajam...@ajamato2016.sea.corp.google.com> Date: 2016-09-14T17:05:25Z Fix Lint Error ---- > Python SDKs writes non RFC compliant JSON files for BQ Export > ------------------------------------------------------------- > > Key: BEAM-618 > URL: https://issues.apache.org/jira/browse/BEAM-618 > Project: Beam > Issue Type: Bug > Components: sdk-py > Reporter: Alex Amato > Assignee: Frances Perry > > Python SDK uses the built in json.dumps to write JSON files to GCS for the BQ > Exporter. BigQuery can fail to parse these files when it tries to load these > files into a BQ table because json.dumps can export JSON which does not > conform to the IEEE RFC. > There are a few cases which are not RFC compilant listed in that module. > https://docs.python.org/2/library/json.html#standard-compliance-and-interoperability > The main issue we run into is the NAN, INF and -INF values. > These fails with a confusing error (and we delete the GCS files making it > hard to debug): > JSON table encountered too many errors, giving up. Rows JSON parsing error in > row starting at position > We can set the allow_nan argument to json.dumps to false to address these > issues. So that when a user tries to write a file with INF, -INF or NAN > Setting this argument will produce this type of error when json.dumps is > called with NAN/INF values. We may want to catch this error to mention the > fact that INF and NAN are not allowed. > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/usr/lib/python2.7/json/__init__.py", line 250, in dumps > sort_keys=sort_keys, **kw).encode(obj) > File "/usr/lib/python2.7/json/encoder.py", line 207, in encode > chunks = self.iterencode(o, _one_shot=True) > File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode > return _iterencode(o, 0) > ValueError: Out of range float values are not JSON compliant -- This message was sent by Atlassian JIRA (v6.3.4#6332)