[
https://issues.apache.org/jira/browse/HUDI-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233187#comment-17233187
]
liwei commented on HUDI-1390:
-----------------------------
[~vinoth] [~xushiyan]
1、 Now i am doing something infer the schema and partitions of files in
object storage such as s3 or oss of aliyun. Then create table in catalog to
store the columns and partition, just like crawler of glue. The files type
contains csv、json、parquet、orc etc. Then the spark or presto can analytics the
data use catalog.
2、I think hudi provide a more powerful data manager ability like
commit、metadata、index. As the scenario of 1. If we can use hudi manage the
unstructured and structured data in object storage without moving data, and
also support the update、commit ability of hudi to the data. It will be valuable
.
3. i am familiar with the bootstrap code, and Will do some research. Then
start a RFC , may be need some times :)
> [UMBRELLA] Support schema inference for unstructured data
> ---------------------------------------------------------
>
> Key: HUDI-1390
> URL: https://issues.apache.org/jira/browse/HUDI-1390
> Project: Apache Hudi
> Issue Type: Improvement
> Components: bootstrap
> Reporter: Raymond Xu
> Priority: Major
> Labels: gsoc, gsoc2021, mentor
>
> (More details to be added)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)