PS The CREATE EXTERNAL TABLE command lets Impala use HDFS-resident files
without affecting their location, lifetime, etc.

On Tue, Sep 26, 2017 at 4:18 PM, Tim Wood <[email protected]> wrote:

>
> ​Hello Sky,
> First, I'm sorry I missed your note the week it came in.
>
> As I can read your questions from several different perspectives, I'll
> just share a few general ideas and suggestions.
>
> There are a few ways to connect up Impala with lots of data.  Several of
> them trade off preparation time and effort in advance in exchange for
> performance with reduced error checking, for example.  A series of INSERT
> statements is inefficient, as you point out, because it does not amortize
> the per-query overhead over the volume of data, and it checks every value
> of every incoming row.
>
> It's not clear which imperfections of Sqoop you refer to, however Impala
> does support loading data into HDFS with Sqoop, then defining a schema on
> top of it after the fact.  If you know your complete schema and have high
> confidence it fits the data you loaded, you can use CREATE TABLE ...
> LOCATION ... to make the new definition point to the newly-loaded files.
> If you load partitioned data, you can follow these commands with ALTER
> TABLE ... RECOVER PARTITIONS and Impala will find new rows loaded into
> partition directories and bind them to the table.
>
> Impala has a limited ability to discover a schema for loaded data, if the
> destination format contains enough metadata.  For example, you could load
> data into HDFS in Parquet format, then issue CREATE TABLE ... LIKE PARQUET
> ..., referencing the new files, and Impala will build that table's metadata
> from the files.  Column types would be limited to those representable in
> Parquet, and Parquet is the only format for which Impala implements this
> feature.
>
> Finally, the LOAD DATA command allows you to populate already-created
> tables in Impala with data from another file *already stored in HDFS*.
> LOAD DATA does not populate tables from arbitrary files in the OS
> filesystem namespace.
>
> Hope this helps!
> TW
>
>
> ---------- Forwarded message ----------
>> From: sky <[email protected]>
>> Date: Wed, Sep 13, 2017 at 7:08 PM
>> ​Im​
>>
>> Subject: Data Transfer Between Different Databases
>> To: "[email protected]" <[email protected]>
>>
>>
>> Hi all,
>>     How does impala interact data with other relational databases?
>>  Sqoop's functionality is not perfect, and in impala, each insert has 100ms
>> query plan overhead. Are there any other easy ways to interact ?
>>
>>
>

Reply via email to