[jira] [Commented] (ARROW-539) [Python] Support reading Parquet datasets with standard partition directory schemes

Wes McKinney (JIRA) Tue, 14 Mar 2017 08:11:01 -0700

    [ 
https://issues.apache.org/jira/browse/ARROW-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15924382#comment-15924382
 ]


Wes McKinney commented on ARROW-539:
------------------------------------

I recommend either using Spark or Impala + Ibis to generate one, here's a 
docker image you can pull to run Impala:

https://github.com/cloudera/ibis/blob/master/circle.yml#L43

Here's some examples of creating partitioned tables in Impala+HDFS with Ibis:

https://github.com/cloudera/ibis/blob/master/ibis/impala/tests/test_partition.py#L58

Let me generate a quick example tarball to attach to this JIRA

> [Python] Support reading Parquet datasets with standard partition directory 
> schemes
> -----------------------------------------------------------------------------------
>
>                 Key: ARROW-539
>                 URL: https://issues.apache.org/jira/browse/ARROW-539
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>            Reporter: Wes McKinney
>
> Currently, we only support multi-file directories with a flat structure 
> (non-partitioned). 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (ARROW-539) [Python] Support reading Parquet datasets with standard partition directory schemes

Reply via email to