[ 
https://issues.apache.org/jira/browse/ARROW-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370677#comment-16370677
 ] 

ASF GitHub Bot commented on ARROW-1632:
---------------------------------------

wesm commented on a change in pull request #1620: ARROW-1632: [Python] Permit 
categorical conversions in Table.to_pandas on a per-column basis
URL: https://github.com/apache/arrow/pull/1620#discussion_r169481823
 
 

 ##########
 File path: cpp/src/arrow/python/arrow_to_pandas.cc
 ##########
 @@ -1771,7 +1790,33 @@ Status ConvertColumnToPandas(PandasOptions options, 
const std::shared_ptr<Column
 
 Status ConvertTableToPandas(PandasOptions options, const 
std::shared_ptr<Table>& table,
                             int nthreads, MemoryPool* pool, PyObject** out) {
-  DataFrameBlockCreator helper(options, table, pool);
+  return ConvertTableToPandas(options, std::unordered_set<std::string>(), 
table, nthreads,
+                              pool, out);
+}
+
+Status ConvertTableToPandas(PandasOptions options,
+                            const std::unordered_set<std::string>& 
categorical_columns,
+                            const std::shared_ptr<Table>& table, int nthreads,
+                            MemoryPool* pool, PyObject** out) {
+  std::shared_ptr<Table> current_table = table;
+  if (!categorical_columns.empty()) {
+    FunctionContext ctx;
+    for (int i = 0; i < table->num_columns(); i++) {
+      const Column& col = *table->column(i);
+      if (categorical_columns.count(col.name())) {
+        Datum out;
+        DictionaryEncode(&ctx, Datum(col.data()), &out);
+        std::shared_ptr<ChunkedArray> array = out.chunked_array();
+        auto field = std::make_shared<Field>(
+            col.name(), array->type(), col.field()->nullable(), 
col.field()->metadata());
+        auto column = std::make_shared<Column>(field, array);
+        current_table->RemoveColumn(i, &current_table);
+        current_table->AddColumn(i, column, &current_table);
 
 Review comment:
   Need to check Status in these two lines

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Permit categorical conversions in Table.to_pandas on a per-column 
> basis
> --------------------------------------------------------------------------------
>
>                 Key: ARROW-1632
>                 URL: https://issues.apache.org/jira/browse/ARROW-1632
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Wes McKinney
>            Assignee: Uwe L. Korn
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>
> Currently this is all or nothing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to