[ 
https://issues.apache.org/jira/browse/HUDI-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233187#comment-17233187
 ] 

liwei commented on HUDI-1390:
-----------------------------

[~vinoth] [~xushiyan]

1、 Now i am  doing something infer the schema and partitions of  files in 
object storage such as s3 or oss of aliyun.  Then create table in catalog to 
store the columns and partition,  just like crawler of glue. The files type 
contains csv、json、parquet、orc etc. Then the spark or presto can analytics the 
data use catalog.

2、I think hudi provide a more powerful data manager ability like 
commit、metadata、index. As the scenario of 1. If we can use hudi manage the 
unstructured and structured data in object storage without moving data, and 
also support the update、commit ability of hudi to the data. It will be valuable 
.

3. i am familiar with the bootstrap code, and  Will do some research. Then 
start a RFC , may be need some times :)

> [UMBRELLA] Support schema inference for unstructured data
> ---------------------------------------------------------
>
>                 Key: HUDI-1390
>                 URL: https://issues.apache.org/jira/browse/HUDI-1390
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: bootstrap
>            Reporter: Raymond Xu
>            Priority: Major
>              Labels: gsoc, gsoc2021, mentor
>
> (More details to be added)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to