yes this was added recently in master. We added support for directly talking thru metastore APIs. It was always talking through JDBC
On Tue, Oct 22, 2019 at 12:26 AM Jaimin Shah <[email protected]> wrote: > Hi Vinoth, > As you said Hudi can use Hive JDBC to talk to metastore. Was this > functionality added after version 0.4.5? Because I am getting this error > Exception in thread "main" > com.uber.hoodie.com.beust.jcommander.ParameterException: Was passed main > parameter '--use-jdbc' but no main parameter was defined in your arg class > > Thanks, > Jaimin > > On Thu, 3 Oct 2019 at 10:56, Vinoth Chandar <[email protected]> wrote: > > > Hi Qian, > > > > You are right on the choice of tools for 2 and 3. But for 1, if you want > to > > do a 1-time bulk load, you can look into options on the migration guide > > http://hudi.apache.org/migration_guide.html (HiveSyncTool is orthogonal > to > > this, it simply registers a Hudi dataset to Hive metastore) > > > > On your questions > > 1. You need the appropriate hudi bundle jar to write data > > http://hudi.apache.org/writing_data.html . For reading also, there are > > similar instructions depending on query engine and yes, you would copy a > > bundle jar and install it. > > 2. You can choose to use Hudi without HiveMetastore and it will give you > > access to ReadOptimized and Incremental Views (Not realtime view, that > > needs Hive atm). Hudi can use Hive JDBC to talk to metastore if thats > what > > you are asking. > > 3. Hudi saves metadata on a special .hoodie folder on your DFS itself. > Its > > usef for building features like incremental pull > > > > Hope that helps > > > > > > On Wed, Oct 2, 2019 at 3:12 PM Qian Wang <[email protected]> wrote: > > > > > Hi Kabeer, > > > > > > I plan to do an incremental query PoC. My use case including: > > > > > > 1. Load one big Hive table located in HDFS to Hudi as a history table > (I > > > think should use HiveSyncTool) > > > 2. Sink streaming data from Kafka to Hudi as real time table(use > > > HoodieDeltaStreamer?) > > > 3. Join both of two table get the incremental metrics (Spark SQL?) > > > > > > My questions: > > > > > > 1. Do I just copy the Hudi packages to the server client for > deployment? > > > 2. Does Hudi must require access to HiveMetastore? My company has > > > restricted to access HiveMetastore? Can Hudi use Hive JDBC to get > > metadata? > > > 3. What is the HoodieTableMeta use for? Where is the HoodieTableMeta > > saved? > > > > > > > > > Best, > > > Qian > > > On Oct 2, 2019, 2:59 PM -0700, Kabeer Ahmed <[email protected]>, > > wrote: > > > > Qian > > > > > > > > Welcome! > > > > Are you able to tell us a bit more about your use case? Eg: type of > the > > > project, industry, complexity of the pipeline that you plan to write > (eg: > > > pulling data from external APIs like New York taxi dataset and writing > > them > > > into Hive for analysis) etc. > > > > This will give us a bit more context. > > > > Thanks > > > > Kabeer. > > > > > > > > On Oct 2 2019, at 10:55 pm, Vinoth Chandar <[email protected]> > wrote: > > > > > edit: > > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113709185#Frequentlyaskedquestions(FAQ)-HowisaHudijobdeployed > > > ? > > > > > with the ? at the end > > > > > > > > > > On Wed, Oct 2, 2019 at 2:54 PM Vinoth Chandar <[email protected]> > > > wrote: > > > > > > Hi Qian, > > > > > > Welcome! Does > > > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113709185#Frequentlyaskedquestions(FAQ)-HowisaHudijobdeployed > > > ? > > > > > > help ? > > > > > > > > > > > > > > > > > > On Wed, Oct 2, 2019 at 10:18 AM Qian Wang <[email protected]> > > > wrote: > > > > > > > Hi, > > > > > > > I am new to Apache Hudi. Currently I am working on a PoC using > > > Hudi and > > > > > > > anyone can give me some documents what how to deploy Apache > Hudi? > > > Thanks. > > > > > > > > > > > > > > Best, > > > > > > > Eric > > > > > > > > > > > > > > > > > > > > > > > > > >
