[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #9466: ARROW-11379: [C++][Dataset] Better formatting for timestamp scalars

GitBox Thu, 11 Feb 2021 05:50:49 -0800


jorisvandenbossche commented on a change in pull request #9466:
URL: https://github.com/apache/arrow/pull/9466#discussion_r574514799




##########
File path: python/pyarrow/tests/test_dataset.py
##########
@@ -1802,6 +1802,29 @@ def test_open_dataset_from_fsspec(tempdir):
     assert dataset.schema.equals(table.schema)
 
 
+def test_filter_timestamp(tempdir):
+    # ARROW-11379
+    import pyarrow.parquet as pq
+    path = tempdir / "test_partition_timestamps"
+
+    table = pa.table({
+        "dates": ['2012-01-01', '2012-01-02'] * 5,
+        "id": range(10)})
+
+    # write dataset partitioned on dates (as strings)
+    part = ds.partitioning(table.select(['dates']).schema, flavor="hive")
+    ds.write_dataset(table, path, partitioning=part, format="feather")
+
+    # read dataset partitioned on dates (as timestamps)
+    part = ds.partitioning(pa.schema([("dates", pa.timestamp("s"))]),
+                           flavor="hive")
+    dataset = ds.dataset(path, format="feather", partitioning=part)
+
+    condition = ds.field("dates") > pd.Timestamp("2012-01-01")

Review comment:
       Something else: does filtering with a string still work (and the string 
being interpreted as timestamp): `ds.field("dates") > "2012-01-01"` (it did 
work based on my comment in the issue, so would be good to add here).
   
   (and if we have three values to use in the filter, maybe put it in a for 
loop or pytest parametrize to avoid repeating the rest of the code)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #9466: ARROW-11379: [C++][Dataset] Better formatting for timestamp scalars

Reply via email to