[ 
https://issues.apache.org/jira/browse/ARROW-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2787:
----------------------------------
    Description: 
I wanted to create a simple example of reading a table in Python and pass it to 
C+, but I'm doing something wrong or there is a memory issue. When the table 
gets to C+ and I print out column names it also prints out a lot of junk and 
what looks like pydocs. Let me know if you need any more info. Thanks!

*demo.py*
{code:python}
import numpy
from psy.automl import cyth
import pandas as pd
from absl import app

def main(argv):
  sup = pd.DataFrame({
  'int': [1, 2],
  'str': ['a', 'b']
  })
  table = pa.Table.from_pandas(sup)
  cyth.c_t(table)
{code}

*cyth.pyx*
{code:python}
import pandas as pd
import pyarrow as pa
from pyarrow.lib cimport *

cdef extern from "cyth.h" namespace "psy":
 void t(shared_ptr[CTable])

def c_t(obj):
 # These print work
 # for i in range(obj.num_columns):
 # print(obj.column(i).name
  cdef shared_ptr[CTable] tbl = pyarrow_unwrap_table(obj)
  t(tbl)
{code}

 *cyth.h*
{code:c++}
#include <iostream>
#include <string>
#include "arrow/api.h"
#include "arrow/python/api.h"
#include "Python.h"

namespace psy {

void t(std::shared_ptr<arrow::Table> pytable) {

// This works
  std::cout << "NUM" << pytable->num_columns();

// This prints a lot of garbage
  for(int i = 0; i < pytable->num_columns(); i++) {
  std::cout << pytable->column(i)->name();
  }
 }
}
{code}
 


  was:
I wanted to create a simple example of reading a table in Python and pass it to 
C+, but I'm doing something wrong or there is a memory issue. When the table 
gets to C+ and I print out column names it also prints out a lot of junk and 
what looks like pydocs. Let me know if you need any more info. Thanks!

 

*demo.py*
import numpy
from psy.automl import cyth
import pandas as pd
from absl import app

def main(argv):
  sup = pd.DataFrame({
  'int': [1, 2],
  'str': ['a', 'b']
  })
  table = pa.Table.from_pandas(sup)
  cyth.c_t(table)


*cyth.pyx*
import pandas as pd
import pyarrow as pa
from pyarrow.lib cimport *

cdef extern from "cyth.h" namespace "psy":
 void t(shared_ptr[CTable])

def c_t(obj):
 # These print work
 # for i in range(obj.num_columns):
 # print(obj.column(i).name
  cdef shared_ptr[CTable] tbl = pyarrow_unwrap_table(obj)
  t(tbl)

 *cyth.h*
#include <iostream>
#include <string>
#include "arrow/api.h"
#include "arrow/python/api.h"
#include "Python.h"

namespace psy {

void t(std::shared_ptr<arrow::Table> pytable) {

// This works
  std::cout << "NUM" << pytable->num_columns();

// This prints a lot of garbage
  for(int i = 0; i < pytable->num_columns(); i++) {
  std::cout << pytable->column(i)->name();
  }
 }

 



> [Python] Memory Issue passing table from python to c++ via cython
> -----------------------------------------------------------------
>
>                 Key: ARROW-2787
>                 URL: https://issues.apache.org/jira/browse/ARROW-2787
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Integration, Python
>    Affects Versions: 0.9.0
>         Environment: clang6
>            Reporter: Joseph Toth
>            Priority: Major
>              Labels: cython
>
> I wanted to create a simple example of reading a table in Python and pass it 
> to C+, but I'm doing something wrong or there is a memory issue. When the 
> table gets to C+ and I print out column names it also prints out a lot of 
> junk and what looks like pydocs. Let me know if you need any more info. 
> Thanks!
> *demo.py*
> {code:python}
> import numpy
> from psy.automl import cyth
> import pandas as pd
> from absl import app
> def main(argv):
>   sup = pd.DataFrame({
>   'int': [1, 2],
>   'str': ['a', 'b']
>   })
>   table = pa.Table.from_pandas(sup)
>   cyth.c_t(table)
> {code}
> *cyth.pyx*
> {code:python}
> import pandas as pd
> import pyarrow as pa
> from pyarrow.lib cimport *
> cdef extern from "cyth.h" namespace "psy":
>  void t(shared_ptr[CTable])
> def c_t(obj):
>  # These print work
>  # for i in range(obj.num_columns):
>  # print(obj.column(i).name
>   cdef shared_ptr[CTable] tbl = pyarrow_unwrap_table(obj)
>   t(tbl)
> {code}
>  *cyth.h*
> {code:c++}
> #include <iostream>
> #include <string>
> #include "arrow/api.h"
> #include "arrow/python/api.h"
> #include "Python.h"
> namespace psy {
> void t(std::shared_ptr<arrow::Table> pytable) {
> // This works
>   std::cout << "NUM" << pytable->num_columns();
> // This prints a lot of garbage
>   for(int i = 0; i < pytable->num_columns(); i++) {
>   std::cout << pytable->column(i)->name();
>   }
>  }
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to