Re: A question about creating persistent table when in-memory catalog is used

Shuai Lin Thu, 26 Jan 2017 02:48:04 -0800

I see, thanks for the info!

On Mon, Jan 23, 2017 at 4:12 PM, Xiao Li <gatorsm...@gmail.com> wrote:


> Reynold mentioned the direction we are heading. You can see many PRs the
> community submitted are for this target. To achieve this, a lot of works we
> need to do.
>
> For example, for some serde, Hive metastore will infer the schema when the
> schema is not provided, but our InMemoryCatalog does not have such a
> capability. Thus, we need to see how to resolve this.
>
> Hopefully, it answers your question. BTW, the issue you mentioned at the
> beginning has been resolved. Please fetch the latest master. You are unable
> to create such a hive serde table without Hive support.
>
> Thanks,
>
> Xiao Li
>
>
> 2017-01-23 0:01 GMT-08:00 Shuai Lin <linshuai2...@gmail.com>:
>
>> Cool, thanks for the info.
>>
>> I think this is something we are going to change to completely decouple
>>> the Hive support and catalog.
>>
>>
>> Is there a ticket for this? I did a search in jira and only found
>> "SPARK-16275: Implement all the Hive fallback functions", which seems to be
>> related to it.
>>
>>
>> On Mon, Jan 23, 2017 at 3:21 AM, Xiao Li <gatorsm...@gmail.com> wrote:
>>
>>> Agree. : )
>>>
>>> 2017-01-22 11:20 GMT-08:00 Reynold Xin <r...@databricks.com>:
>>>
>>>> To be clear there are two separate "hive" we are talking about here.
>>>> One is the catalog, and the other is the Hive serde and UDF support. We
>>>> want to get to a point that the choice of catalog does not impact the
>>>> functionality in Spark other than where the catalog is stored.
>>>>
>>>>
>>>> On Sun, Jan 22, 2017 at 11:18 AM Xiao Li <gatorsm...@gmail.com> wrote:
>>>>
>>>>> We have a pending PR to block users to create the Hive serde table
>>>>> when using InMemroyCatalog. See: https://github.com/apache/spar
>>>>> k/pull/16587 I believe it answers your question.
>>>>>
>>>>> BTW, we still can create the regular data source tables and insert the
>>>>> data into the tables. The major difference is whether the metadata is
>>>>> persistently stored or not.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Xiao Li
>>>>>
>>>>> 2017-01-22 11:14 GMT-08:00 Reynold Xin <r...@databricks.com>:
>>>>>
>>>>> I think this is something we are going to change to completely
>>>>> decouple the Hive support and catalog.
>>>>>
>>>>>
>>>>> On Sun, Jan 22, 2017 at 4:51 AM Shuai Lin <linshuai2...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> Currently when the in-memory catalog is used, e.g. through `--conf
>>>>> spark.sql.catalogImplementation=in-memory`, we can create a
>>>>> persistent table, but inserting into this table would fail with error
>>>>> message "Hive support is required to insert into the following tables..".
>>>>>
>>>>>     sql("create table t1 (id int, name string, dept string)") // OK
>>>>>     sql("insert into t1 values (1, 'name1', 'dept1')")  // ERROR
>>>>>
>>>>>
>>>>> This doesn't make sense for me, because this table would always be
>>>>> empty if we can't insert into it, thus would be of no use. But I wonder if
>>>>> there are other good reasons for the current logic. If not, I would 
>>>>> propose
>>>>> to raise an error when creating the table in the first place.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Regards,
>>>>> Shuai Lin (@lins05)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>
>

Re: A question about creating persistent table when in-memory catalog is used

Reply via email to