[
https://issues.apache.org/jira/browse/CARBONDATA-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jacky Li resolved CARBONDATA-3255.
----------------------------------
Fix Version/s: 2.0.0
Resolution: Fixed
> CarbonData provides python interface to support to write and read structured
> and unstructured data in CarbonData
> ----------------------------------------------------------------------------------------------------------------
>
> Key: CARBONDATA-3255
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3255
> Project: CarbonData
> Issue Type: Sub-task
> Reporter: Bo Xu
> Assignee: Bo Xu
> Priority: Major
> Fix For: 2.0.0
>
> Time Spent: 11h 10m
> Remaining Estimate: 0h
>
> Apache CarbonData already provide Java/ Scala/C++ interface for users, and
> more and more people use python to manage and analysis big data, so it's
> better to provide python interface to support to write and read structured
> and unstructured data in CarbonData, like String, int and binary data:
> image/voice/video. It should not dependency Apache Spark. We called it is
> PYSDK.
> PYSDK based on CarbonData Java SDK, use pyjnius to call java code in python
> code. Even though Apache Spark use py4j in PySpark to call java code in
> python, but it's low performance when use py4j to read bigdata with
> CarbonData format in python code, py4j also show low performance when read
> big data in their report:
> https://www.py4j.org/advanced_topics.html#performance. JPype is also a
> popular tool to call java code in python, but it already stoped update
> several years ago, so we can not use it. In our test, pyjnius has high
> performance to read big data by call java code in python, so it's good choice
> for us.
> We already work for these feature several months in
> https://github.com/xubo245/pycarbon
> Goals:
> 1. PYSDK should provide interface to support read data
> 2. PYSDK should provide interface to support write data
> 3. PYSDK should support basic data types
> 4. PYSDK should support projection
> 5. PYSDK should support filter
--
This message was sent by Atlassian Jira
(v8.3.4#803005)