Hi Vinoth, thanks for your comments on this. I spent sometime thinking over another possibility which would be externalising the Hudi timeline service itself to an external server holding both operational (ie Hudi) and business metadata.
would you guys have any opinion on that ? would that be easy as I do not seem to see a way yet , except reading about RocksDB but that is still not quite clear. best regards, Mario. Em seg., 1 de jun. de 2020 às 16:01, Vinoth Chandar < [email protected]> escreveu: > Hi Mario, > > Thanks for the detailed explanation. Hudi already allows extra metadata to > be written atomically with each commit i.e write operation. In fact, that > is how we track checkpoints for our delta streamer tool.. It may not solve > the need for querying the data together with this information. but gives > you ability to do some basic tagging.. if thats useful > > >>If we enable the timeline service metadata model to be extended we could > use the service instance itself to support specialised queries that involve > business qualifiers in order to return a proper set of metadata pointing to > the related commits > > This is a good idea actually.. There is another active discuss thread on > making the metadata queryable.. there is also > https://issues.apache.org/jira/browse/HUDI-309 which we paused for now.. > But that's more in line with what you are thinking IIUC > > > Thanks > vinoth > > On Mon, Jun 1, 2020 at 4:41 AM Mario de Sá Vera <[email protected]> > wrote: > > > Hi Balaji, > > > > business metadata are all types of info related to the business where the > > Hudi solution is being used... from a COB (ie close of business date) > > related to that commit to any qualifier related to that commit that might > > be useful to be associated with that commit id. If we enable the timeline > > service metadata model to be extended we could use the service instance > > itself to support specialised queries that involve business qualifiers in > > order to return a proper set of metadata pointing to the related commits > > that answer a business query. > > > > if we do not have that flexibility we might end up creating a external > > transaction log and then comes the hard task to make that service in sync > > to the timeline service. > > > > let me know if that makes sense to you, > > > > Mario. > > > > Em seg., 1 de jun. de 2020 às 06:55, Balaji Varadarajan > > <[email protected]> escreveu: > > > > > Hi Mario, > > > Timeline Server was designed to serve hudi metadata for Hudi writers > and > > > readers. it may not be suitable to serve arbitrary data. But, it is an > > > interesting thought. Can you elaborate more on what kind of business > > > metadata are you looking. Is this something you are planning to store > in > > > commit files ? > > > Balaji.V > > > > > > On Sunday, May 31, 2020, 04:22:27 PM PDT, Mario de Sá Vera < > > > [email protected]> wrote: > > > > > > I see a need for extending the current timeline server schema so that > a > > > flexible model could be achieved in order to accommodate business > > metadata. > > > > > > let me know if that makes sense to anyone here... > > > > > > Regards, > > > > > > Mario. > > > > > >
