>From the docs https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
I conclude that Hive2 is not a JDBC Type 4 driver. Only JDBC Type 4 drivers are supported by JDBC plugin. JDBC Type 4 does no longer need Class.forName. Jörg On Mon, Jul 28, 2014 at 1:27 PM, Santosh B <[email protected]> wrote: > Hi, > Its a very good feature. > I was trying to use JDBC driver to import from hive/Impala but it never > works whereas mysql connector works perfectly fine. > Is it something it was specifically designed to work for mysql,MSSQl...and > few of them or any other databases which supports JDBC. > > Thanks, > Santosh B > > On Sunday, 17 June 2012 02:29:18 UTC+5:30, Jörg Prante wrote: >> >> Hi, >> >> I'd like to announce a JDBC river implementation. >> >> It can be found at https://github.com/jprante/elasticsearch-river-jdbc >> >> I hope it is useful for all of you who need to index data from SQL >> databases into ElasticSearch. >> >> Suggestions, corrections, improvements are welcome! >> >> Introduction >> ------------ >> >> The Java Database Connection (JDBC) river allows to select data from JDBC >> sources for indexing into ElasticSearch. >> >> It is implemented as an Elasticsearch plugin. >> >> The relational data is internally transformed into structured JSON >> objects for ElasticSearch schema-less indexing. >> >> Setting it up is as simple as executing something like the following >> against ElasticSearch: >> >> curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{ >> "type" : "jdbc", >> "jdbc" : { >> "driver" : "com.mysql.jdbc.Driver", >> "url" : "jdbc:mysql://localhost:3306/test", >> "user" : "", >> "password" : "", >> "sql" : "select * from orders", >> } >> }' >> >> This HTTP PUT statement will create a river named `my_jdbc_river` >> that fetches all the rows from the `orders` table in the MySQL database >> `test` at `localhost`. >> >> You have to install the JDBC driver jar of your favorite database >> manually into >> the `plugins` directory where the jar file of the JDBC river plugin >> resides. >> >> By default, the JDBC river re-executes the SQL statement on a regular >> basis (60 minutes). >> >> In case of a failover, the JDBC river will automatically be restarted >> on another ElasticSearch node, and continue indexing. >> >> Many JDBC rivers can run in parallel. Each river opens one thread to >> select >> the data. >> >> Installation >> ------------ >> >> In order to install the plugin, simply run: `bin/plugin -install >> jprante/elasticsearch-river-jdbc/1.0.0`. >> >> Log example of river creation >> ----------------------------- >> [2012-06-16 18:50:10,035][INFO ][cluster.metadata ] [Anomaly] >> [_river] update_mapping [my_jdbc_river] (dynamic) >> [2012-06-16 18:50:10,046][INFO ][river.jdbc ] [Anomaly] >> [jdbc][my_jdbc_river] starting JDBC connector: URL >> [jdbc:mysql://localhost:3306/test], driver [com.mysql.jdbc.Driver], sql >> [select >> * from orders], indexing to [jdbc]/[jdbc], poll [1h] >> [2012-06-16 18:50:10,129][INFO ][cluster.metadata ] [Anomaly] >> [jdbc] creating index, cause [api], shards [5]/[1], mappings [] >> [2012-06-16 18:50:10,353][INFO ][cluster.metadata ] [Anomaly] >> [_river] update_mapping [my_jdbc_river] (dynamic) >> [2012-06-16 18:50:10,714][INFO ][river.jdbc ] [Anomaly] >> [jdbc][my_jdbc_river] got 5 rows >> [2012-06-16 18:50:10,719][INFO ][river.jdbc ] [Anomaly] >> [jdbc][my_jdbc_river] next run, waiting 1h, URL >> [jdbc:mysql://localhost:3306/test] driver [com.mysql.jdbc.Driver] sql >> [select * from orders] >> >> Configuration >> ============= >> >> The SQL statements used for selecting can be configured as follows. >> >> Star query >> ---------- >> >> Star queries are the simplest variant of selecting data. They can be used >> to dump tables into ElasticSearch. >> >> curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{ >> "type" : "jdbc", >> "jdbc" : { >> "driver" : "com.mysql.jdbc.Driver", >> "url" : "jdbc:mysql://localhost:3306/test", >> "user" : "", >> "password" : "", >> "sql" : "select * from orders" >> } >> }' >> >> For example >> >> mysql> select * from orders; >> +----------+-----------------+---------+----------+---------------------+ >> | customer | department | product | quantity | created | >> +----------+-----------------+---------+----------+---------------------+ >> | Big | American Fruits | Apples | 1 | 0000-00-00 00:00:00 | >> | Large | German Fruits | Bananas | 1 | 0000-00-00 00:00:00 | >> | Huge | German Fruits | Oranges | 2 | 0000-00-00 00:00:00 | >> | Good | German Fruits | Apples | 2 | 2012-06-01 00:00:00 | >> | Bad | English Fruits | Oranges | 3 | 2012-06-01 00:00:00 | >> +----------+-----------------+---------+----------+---------------------+ >> 5 rows in set (0.00 sec) >> >> The JSON objects are flat, the `id` >> of the documents is generated automatically, it is the row number. >> >> id=0 {"product":"Apples","created":null,"department":"American >> Fruits","quantity":1,"customer":"Big"} >> id=1 {"product":"Bananas","created":null,"department":"German >> Fruits","quantity":1,"customer":"Large"} >> id=2 {"product":"Oranges","created":null,"department":"German >> Fruits","quantity":2,"customer":"Huge"} >> id=3 {"product":"Apples","created":1338501600000,"department":"German >> Fruits","quantity":2,"customer":"Good"} >> id=4 {"product":"Oranges","created":1338501600000,"department":"English >> Fruits","quantity":3,"customer":"Bad"} >> >> Labeled columns >> --------------- >> >> In SQL, each column may be labeled with a name. This name is used by the >> JDBC river to JSON object construction. >> >> curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{ >> "type" : "jdbc", >> "jdbc" : { >> "driver" : "com.mysql.jdbc.Driver", >> "url" : "jdbc:mysql://localhost:3306/test", >> "user" : "", >> "password" : "", >> "sql" : "select products.name as \"product.name\", >> orders.customer as \"product.customer.name\", orders.quantity * >> products.price as \"product.customer.bill\" from products, orders where >> products.name = orders.product" >> } >> }' >> >> In this query, the columns selected are described as `product.name`, >> `product.customer.name`, and `product.customer.bill`. >> >> mysql> select products.name as "product.name", orders.customer as >> "product.customer", orders.quantity * products.price as >> "product.customer.bill" from products, orders where products.name = >> orders.product ; >> +--------------+------------------+-----------------------+ >> | product.name | product.customer | product.customer.bill | >> +--------------+------------------+-----------------------+ >> | Apples | Big | 1 | >> | Bananas | Large | 2 | >> | Oranges | Huge | 6 | >> | Apples | Good | 2 | >> | Oranges | Bad | 9 | >> +--------------+------------------+-----------------------+ >> 5 rows in set, 5 warnings (0.00 sec) >> >> The JSON objects are >> >> id=0 {"product":{"name":"Apples","customer":{"bill":1.0,"name":"Big"}}} >> id=1 {"product":{"name":"Bananas","customer":{"bill":2.0,"name":" >> Large"}}} >> id=2 {"product":{"name":"Oranges","customer":{"bill":6.0,"name":"Huge"}}} >> id=3 {"product":{"name":"Apples","customer":{"bill":2.0,"name":"Good"}}} >> id=4 {"product":{"name":"Oranges","customer":{"bill":9.0,"name":"Bad"}}} >> >> There are three column labels with an underscore as prefix >> that are mapped to the Elasticsearch index/type/id. >> >> _id >> _type >> _index >> >> Structured objects >> ------------------ >> >> One of the advantage of SQL queries is the join operation. From many >> tables, new tuples can be formed. >> >> curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{ >> "type" : "jdbc", >> "jdbc" : { >> "driver" : "com.mysql.jdbc.Driver", >> "url" : "jdbc:mysql://localhost:3306/test", >> "user" : "", >> "password" : "", >> "sql" : "select \"relations\" as \"_index\", orders.customer as >> \"_id\", orders.customer as \"contact.customer\", employees.name as >> \"contact.employee\" from orders left join employees on >> employees.department = orders.department" >> } >> }' >> >> For example, these rows from SQL >> >> mysql> select "relations" as "_index", orders.customer as "_id", >> orders.customer as "contact.customer", employees.name as >> "contact.employee" from orders left join employees on employees.department >> = orders.department; >> +-----------+-------+------------------+------------------+ >> | _index | _id | contact.customer | contact.employee | >> +-----------+-------+------------------+------------------+ >> | relations | Big | Big | Smith | >> | relations | Large | Large | Müller | >> | relations | Large | Large | Meier | >> | relations | Large | Large | Schulze | >> | relations | Huge | Huge | Müller | >> | relations | Huge | Huge | Meier | >> | relations | Huge | Huge | Schulze | >> | relations | Good | Good | Müller | >> | relations | Good | Good | Meier | >> | relations | Good | Good | Schulze | >> | relations | Bad | Bad | Jones | >> +-----------+-------+------------------+------------------+ >> 11 rows in set (0.00 sec) >> >> will generate fewer JSON objects for the index `relations`. >> >> index=relations id=Big {"contact":{"employee":"Smith","customer":"Big"}} >> index=relations id=Large {"contact":{"employee":[" >> Müller","Meier","Schulze"],"customer":"Large"}} >> index=relations id=Huge {"contact":{"employee":[" >> Müller","Meier","Schulze"],"customer":"Huge"}} >> index=relations id=Good {"contact":{"employee":[" >> Müller","Meier","Schulze"],"customer":"Good"}} >> index=relations id=Bad {"contact":{"employee":"Jones","customer":"Bad"}} >> >> Note how the `employee` column is collapsed into a JSON array. The >> repeated occurence of the `_id` column >> controls how values are folded into arrays for making use of the >> ElasticSearch JSON data model. >> >> Bind parameter >> -------------- >> >> Bind parameters are useful for selecting rows according to a matching >> condition >> where the match criteria is not known beforehand. >> >> For example, only rows matching certain conditions can be indexed into >> ElasticSearch. >> >> curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{ >> "type" : "jdbc", >> "jdbc" : { >> "driver" : "com.mysql.jdbc.Driver", >> "url" : "jdbc:mysql://localhost:3306/test", >> "user" : "", >> "password" : "", >> "sql" : "select products.name as \"product.name\", >> orders.customer as \"product.customer.name\", orders.quantity * >> products.price as \"product.customer.bill\" from products, orders where >> products.name = orders.product and orders.quantity * products.price > ?", >> "params: [ 5.0 ] >> } >> }' >> >> Example result >> >> id=0 {"product":{"name":"Oranges","customer":{"bill":6.0,"name":"Huge"}}} >> id=1 {"product":{"name":"Oranges","customer":{"bill":9.0,"name":"Bad"}}} >> >> >> Time-based selecting >> -------------------- >> >> Because the JDBC river is running repeatedly, time-based selecting is >> useful. >> The current time is represented by the parameter value `$now`. >> >> In this example, all rows beginning with a certain date up to now are >> selected. >> >> curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{ >> "type" : "jdbc", >> "jdbc" : { >> "driver" : "com.mysql.jdbc.Driver", >> "url" : "jdbc:mysql://localhost:3306/test", >> "user" : "", >> "password" : "", >> "sql" : "select products.name as \"product.name\", >> orders.customer as \"product.customer.name\", orders.quantity * >> products.price as \"product.customer.bill\" from products, orders where >> products.name = orders.product and orders.created between ? - 14 and ?", >> "params: [ 2012-06-01", "$now" ] >> } >> }' >> >> Example result: >> >> id=0 {"product":{"name":"Apples","customer":{"bill":2.0,"name":"Good"}}} >> id=1 {"product":{"name":"Oranges","customer":{"bill":9.0,"name":"Bad"}}} >> >> Index >> ----- >> >> Each river can index into a specified index. Example: >> >> curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{ >> "type" : "jdbc", >> "jdbc" : { >> "driver" : "com.mysql.jdbc.Driver", >> "url" : "jdbc:mysql://localhost:3306/test", >> "user" : "", >> "password" : "", >> "sql" : "select * from orders", >> }, >> "index" : { >> "index" : "jdbc", >> "type" : "jdbc" >> } >> }' >> >> Bulk indexing >> ------------- >> >> Bulk indexing is automatically used in order to speed up the indexing >> process. >> >> Each SQL result set will be indexed by a single bulk if the bulk size is >> not specified. >> >> A bulk size can be defined, also a maximum size of active bulk requests >> to cope with high load situations. >> A bulk timeout defines the time period after which bulk feeds continue. >> >> curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{ >> "type" : "jdbc", >> "jdbc" : { >> "driver" : "com.mysql.jdbc.Driver", >> "url" : "jdbc:mysql://localhost:3306/test", >> "user" : "", >> "password" : "", >> "sql" : "select * from orders", >> }, >> "index" : { >> "index" : "jdbc", >> "type" : "jdbc", >> "bulk_size" : 100, >> "max_bulk_requests" : 30, >> "bulk_timeout" : "60s" >> } >> }' >> >> >> Stopping/deleting the river >> --------------------------- >> >> curl -XDELETE 'localhost:9200/_river/my_jdbc_river/' >> >> >> Best regards,, >> >> Jörg >> >> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/17dc1eed-3a7c-4b7c-992d-bd0c65b2e3ff%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/17dc1eed-3a7c-4b7c-992d-bd0c65b2e3ff%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHRZw-sFaPrR6VmJqNDFUtRMTYuKHA6h1_sukWZsZWWMA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
