Hey Josep,
    Thanks for reaching out. Sorry for the delay as it's the holiday season.

Your understanding of metadata table and extraMetadata is right wrt
Hudi. But, can you throw some more light as to how the additional
metadata is leveraged? Are you planning to introduce a new indexing
for writes or for reads? Based on your write up, it is per file
metadata.

By the way, if it is a general purpose index, do you want to develop
it in the open, so that others can benefit as well.

On Thu, 19 Dec 2024 at 11:51, Josep Sampé <jo...@qbeast.io> wrote:
>
> Hi everyone,
>
> In our team, we started exploring the integration of Hudi in Qbeast (
> qbeast.io), and over the last few months, I've been diving into Hudi's
> internals to identify the best place for integration and the most suitable
> components to use.
>
> In Qbeast, we’re introducing a new way of indexing data, which requires
> additional metadata for each Parquet file. Currently, we’re storing this
> metadata in the extraMetadata field of the commit files, as Hudi allows
> user-defined metadata there. However, I’m wondering if this is the best
> approach or if it would be better to store this information in the metadata
> table.
>
> From my understanding:
>
>    - The extraMetadata field is flexible and user-defined, which makes it
>    easy to use for custom metadata.
>    - The metadata table seems more "closed" and focused on specific
>    system-level functionalities.
>
> Would it make more sense to continue using extraMetadata, or is there a
> recommended way to extend the metadata table to include custom fields like
> these? Any guidance or best practices would be greatly appreciated!
>
> Thanks in advance!



-- 
Regards,
-Sivabalan

Reply via email to