PS The CREATE EXTERNAL TABLE command lets Impala use HDFS-resident files without affecting their location, lifetime, etc.
On Tue, Sep 26, 2017 at 4:18 PM, Tim Wood <[email protected]> wrote: > > Hello Sky, > First, I'm sorry I missed your note the week it came in. > > As I can read your questions from several different perspectives, I'll > just share a few general ideas and suggestions. > > There are a few ways to connect up Impala with lots of data. Several of > them trade off preparation time and effort in advance in exchange for > performance with reduced error checking, for example. A series of INSERT > statements is inefficient, as you point out, because it does not amortize > the per-query overhead over the volume of data, and it checks every value > of every incoming row. > > It's not clear which imperfections of Sqoop you refer to, however Impala > does support loading data into HDFS with Sqoop, then defining a schema on > top of it after the fact. If you know your complete schema and have high > confidence it fits the data you loaded, you can use CREATE TABLE ... > LOCATION ... to make the new definition point to the newly-loaded files. > If you load partitioned data, you can follow these commands with ALTER > TABLE ... RECOVER PARTITIONS and Impala will find new rows loaded into > partition directories and bind them to the table. > > Impala has a limited ability to discover a schema for loaded data, if the > destination format contains enough metadata. For example, you could load > data into HDFS in Parquet format, then issue CREATE TABLE ... LIKE PARQUET > ..., referencing the new files, and Impala will build that table's metadata > from the files. Column types would be limited to those representable in > Parquet, and Parquet is the only format for which Impala implements this > feature. > > Finally, the LOAD DATA command allows you to populate already-created > tables in Impala with data from another file *already stored in HDFS*. > LOAD DATA does not populate tables from arbitrary files in the OS > filesystem namespace. > > Hope this helps! > TW > > > ---------- Forwarded message ---------- >> From: sky <[email protected]> >> Date: Wed, Sep 13, 2017 at 7:08 PM >> Im >> >> Subject: Data Transfer Between Different Databases >> To: "[email protected]" <[email protected]> >> >> >> Hi all, >> How does impala interact data with other relational databases? >> Sqoop's functionality is not perfect, and in impala, each insert has 100ms >> query plan overhead. Are there any other easy ways to interact ? >> >> >
