Re: HiveContext standalone => without a Hive metastore

Michael Armbrust Thu, 26 May 2016 13:57:06 -0700

You can also just make sure that each user is using their own directory.  A
rough example can be found in TestHive.


Note: in Spark 2.0 there should be no need to use HiveContext unless you
need to talk to a metastore.

On Thu, May 26, 2016 at 1:36 PM, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Well make sure than you set up a reasonable RDBMS as metastore. Ours is
> Oracle but you can get away with others. Check the supported list in
>
> hduser@rhes564:: :/usr/lib/hive/scripts/metastore/upgrade> ltr
> total 40
> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 postgres
> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 mysql
> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 mssql
> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 derby
> drwxr-xr-x 3 hduser hadoop 4096 May 20 18:44 oracle
>
> you have few good ones in the list.  In general the base tables (without
> transactional support) are around 55  (Hive 2) and don't take much space
> (depending on the volume of tables). I attached a E-R diagram.
>
> HTH
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 26 May 2016 at 19:09, Gerard Maas <gerard.m...@gmail.com> wrote:
>
>> Thanks a lot for the advice!.
>>
>> I found out why the standalone hiveContext would not work:  it was trying
>> to deploy a derby db and the user had no rights to create the dir where
>> there db is stored:
>>
>> Caused by: java.sql.SQLException: Failed to create database
>> 'metastore_db', see the next exception for details.
>>
>>        at
>> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown
>> Source)
>>
>>        at
>> org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
>> Source)
>>
>>        ... 129 more
>>
>> Caused by: java.sql.SQLException: Directory
>> /usr/share/spark-notebook/metastore_db cannot be created.
>>
>>
>> Now, the new issue is that we can't start more than 1 context at the same
>> time. I think we will need to setup a proper metastore.
>>
>>
>> -kind regards, Gerard.
>>
>>
>>
>>
>> On Thu, May 26, 2016 at 3:06 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> To use HiveContext witch is basically an sql api within Spark without
>>> proper hive set up does not make sense. It is a super set of Spark
>>> SQLContext
>>>
>>> In addition simple things like registerTempTable may not work.
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 26 May 2016 at 13:01, Silvio Fiorito <silvio.fior...@granturing.com>
>>> wrote:
>>>
>>>> Hi Gerard,
>>>>
>>>>
>>>>
>>>> I’ve never had an issue using the HiveContext without a hive-site.xml
>>>> configured. However, one issue you may have is if multiple users are
>>>> starting the HiveContext from the same path, they’ll all be trying to store
>>>> the default Derby metastore in the same location. Also, if you want them to
>>>> be able to persist permanent table metadata for SparkSQL then you’ll want
>>>> to set up a true metastore.
>>>>
>>>>
>>>>
>>>> The other thing it could be is Hive dependency collisions from the
>>>> classpath, but that shouldn’t be an issue since you said it’s standalone
>>>> (not a Hadoop distro right?).
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Silvio
>>>>
>>>>
>>>>
>>>> *From: *Gerard Maas <gerard.m...@gmail.com>
>>>> *Date: *Thursday, May 26, 2016 at 5:28 AM
>>>> *To: *spark users <user@spark.apache.org>
>>>> *Subject: *HiveContext standalone => without a Hive metastore
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> I'm helping some folks setting up an analytics cluster with  Spark.
>>>>
>>>> They want to use the HiveContext to enable the Window functions on
>>>> DataFrames(*) but they don't have any Hive installation, nor they need one
>>>> at the moment (if not necessary for this feature)
>>>>
>>>>
>>>>
>>>> When we try to create a Hive context, we get the following error:
>>>>
>>>>
>>>>
>>>> > val sqlContext = new
>>>> org.apache.spark.sql.hive.HiveContext(sparkContext)
>>>>
>>>> java.lang.RuntimeException: java.lang.RuntimeException: Unable to
>>>> instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
>>>>
>>>>        at
>>>> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
>>>>
>>>>
>>>>
>>>> Is my HiveContext failing b/c it wants to connect to an unconfigured
>>>>  Hive Metastore?
>>>>
>>>>
>>>>
>>>> Is there  a way to instantiate a HiveContext for the sake of Window
>>>> support without an underlying Hive deployment?
>>>>
>>>>
>>>>
>>>> The docs are explicit in saying that that is should be the case: [1]
>>>>
>>>>
>>>>
>>>> "To use a HiveContext, you do not need to have an existing Hive setup,
>>>> and all of the data sources available to aSQLContext are still
>>>> available. HiveContext is only packaged separately to avoid including
>>>> all of Hive’s dependencies in the default Spark build."
>>>>
>>>>
>>>>
>>>> So what is the right way to address this issue? How to instantiate a
>>>> HiveContext with spark running on a HDFS cluster without Hive deployed?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Thanks a lot!
>>>>
>>>>
>>>>
>>>> -Gerard.
>>>>
>>>>
>>>>
>>>> (*) The need for a HiveContext to use Window functions is pretty
>>>> obscure. The only documentation of this seems to be a runtime exception: 
>>>> "org.apache.spark.sql.AnalysisException:
>>>> Could not resolve window function 'max'. Note that, using window functions
>>>> currently requires a HiveContext;"
>>>>
>>>>
>>>>
>>>> [1]
>>>> http://spark.apache.org/docs/latest/sql-programming-guide.html#getting-started
>>>>
>>>
>>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

Re: HiveContext standalone => without a Hive metastore

Reply via email to