[ 
https://issues.apache.org/jira/browse/BEAM-4444?focusedWorklogId=167582&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-167582
 ]

ASF GitHub Bot logged work on BEAM-4444:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 20/Nov/18 01:17
            Start Date: 20/Nov/18 01:17
    Worklog Time Spent: 10m 
      Work Description: udim commented on a change in pull request #6763: 
[BEAM-4444] Parquet IO for Python SDK
URL: https://github.com/apache/beam/pull/6763#discussion_r234839939
 
 

 ##########
 File path: sdks/python/apache_beam/io/parquetio_it_test.py
 ##########
 @@ -0,0 +1,175 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+from __future__ import absolute_import
+from __future__ import division
+
+import logging
+import string
+import sys
+import unittest
+from collections import Counter
+
+import pyarrow as pa
+from nose.plugins.attrib import attr
+
+from apache_beam import Create
+from apache_beam import DoFn
+from apache_beam import FlatMap
+from apache_beam import Flatten
+from apache_beam import Map
+from apache_beam import ParDo
+from apache_beam import Reshuffle
+from apache_beam.io.filesystems import FileSystems
+from apache_beam.io.parquetio import ReadAllFromParquet
+from apache_beam.io.parquetio import WriteToParquet
+from apache_beam.testing.test_pipeline import TestPipeline
+from apache_beam.testing.util import BeamAssertException
+from apache_beam.transforms.combiners import Count
+from apache_beam.transforms.combiners import Mean
+
+
+class TestParquetIT(unittest.TestCase):
+
+  @classmethod
+  def setUpClass(cls):
+    # Method has been renamed in Python 3
+    if sys.version_info[0] < 3:
+      cls.assertCountEqual = cls.assertItemsEqual
+
+  def setUp(self):
+    pass
+
+  def tearDown(self):
+    pass
+
+  SCHEMA = pa.schema([
+      ('name', pa.binary()),
+      ('favorite_number', pa.int64()),
+      ('favorite_color', pa.binary())
+  ])
+
+  @attr('IT')
+  def test_parquetio_it(self):
+    file_prefix = "parquet_it_test"
+    init_size = 10
+    data_size = 20000
+    p = TestPipeline(is_integration_test=True)
+    pcol = self._generate_data(
+        p, file_prefix, init_size, data_size)
+    self._verify_data(pcol, init_size, data_size)
+    result = p.run()
+    result.wait_until_finish()
+
+  @staticmethod
+  def _mean_verifier(data_size, x):
+    expected = sum(range(data_size)) / data_size
+    if x != expected:
 
 Review comment:
   You're comparing floating point numbers directly here. While this seems to 
work somehow in this case (all means end with .0 or 0.5), this doesn't extend 
to the general case. Please compare floats with epsilon (e.g. abs(a - b) < 
1e7)) or just compare the sum instead of the mean.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 167582)

> Parquet IO for Python SDK
> -------------------------
>
>                 Key: BEAM-4444
>                 URL: https://issues.apache.org/jira/browse/BEAM-4444
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-py-core
>            Reporter: Bruce Arctor
>            Assignee: Heejong Lee
>            Priority: Major
>          Time Spent: 10h
>  Remaining Estimate: 0h
>
> Add Parquet Support for the Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to