Thanks a lot for sharing....
On Mon, Jul 28, 2014 at 6:16 PM, [email protected] < [email protected]> wrote: > From the docs > > https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients > > I conclude that Hive2 is not a JDBC Type 4 driver. Only JDBC Type 4 > drivers are supported by JDBC plugin. JDBC Type 4 does no longer need > Class.forName. > > Jörg > > > > On Mon, Jul 28, 2014 at 1:27 PM, Santosh B <[email protected]> > wrote: > >> Hi, >> Its a very good feature. >> I was trying to use JDBC driver to import from hive/Impala but it never >> works whereas mysql connector works perfectly fine. >> Is it something it was specifically designed to work for >> mysql,MSSQl...and few of them or any other databases which supports JDBC. >> >> Thanks, >> Santosh B >> >> On Sunday, 17 June 2012 02:29:18 UTC+5:30, Jörg Prante wrote: >>> >>> Hi, >>> >>> I'd like to announce a JDBC river implementation. >>> >>> It can be found at https://github.com/jprante/elasticsearch-river-jdbc >>> >>> I hope it is useful for all of you who need to index data from SQL >>> databases into ElasticSearch. >>> >>> Suggestions, corrections, improvements are welcome! >>> >>> Introduction >>> ------------ >>> >>> The Java Database Connection (JDBC) river allows to select data from >>> JDBC sources for indexing into ElasticSearch. >>> >>> It is implemented as an Elasticsearch plugin. >>> >>> The relational data is internally transformed into structured JSON >>> objects for ElasticSearch schema-less indexing. >>> >>> Setting it up is as simple as executing something like the following >>> against ElasticSearch: >>> >>> curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{ >>> "type" : "jdbc", >>> "jdbc" : { >>> "driver" : "com.mysql.jdbc.Driver", >>> "url" : "jdbc:mysql://localhost:3306/test", >>> "user" : "", >>> "password" : "", >>> "sql" : "select * from orders", >>> } >>> }' >>> >>> This HTTP PUT statement will create a river named `my_jdbc_river` >>> that fetches all the rows from the `orders` table in the MySQL database >>> `test` at `localhost`. >>> >>> You have to install the JDBC driver jar of your favorite database >>> manually into >>> the `plugins` directory where the jar file of the JDBC river plugin >>> resides. >>> >>> By default, the JDBC river re-executes the SQL statement on a regular >>> basis (60 minutes). >>> >>> In case of a failover, the JDBC river will automatically be restarted >>> on another ElasticSearch node, and continue indexing. >>> >>> Many JDBC rivers can run in parallel. Each river opens one thread to >>> select >>> the data. >>> >>> Installation >>> ------------ >>> >>> In order to install the plugin, simply run: `bin/plugin -install >>> jprante/elasticsearch-river-jdbc/1.0.0`. >>> >>> Log example of river creation >>> ----------------------------- >>> [2012-06-16 18:50:10,035][INFO ][cluster.metadata ] [Anomaly] >>> [_river] update_mapping [my_jdbc_river] (dynamic) >>> [2012-06-16 18:50:10,046][INFO ][river.jdbc ] [Anomaly] >>> [jdbc][my_jdbc_river] starting JDBC connector: URL >>> [jdbc:mysql://localhost:3306/test], driver [com.mysql.jdbc.Driver], sql >>> [select >>> * from orders], indexing to [jdbc]/[jdbc], poll [1h] >>> [2012-06-16 18:50:10,129][INFO ][cluster.metadata ] [Anomaly] >>> [jdbc] creating index, cause [api], shards [5]/[1], mappings [] >>> [2012-06-16 18:50:10,353][INFO ][cluster.metadata ] [Anomaly] >>> [_river] update_mapping [my_jdbc_river] (dynamic) >>> [2012-06-16 18:50:10,714][INFO ][river.jdbc ] [Anomaly] >>> [jdbc][my_jdbc_river] got 5 rows >>> [2012-06-16 18:50:10,719][INFO ][river.jdbc ] [Anomaly] >>> [jdbc][my_jdbc_river] next run, waiting 1h, URL >>> [jdbc:mysql://localhost:3306/test] driver [com.mysql.jdbc.Driver] sql >>> [select * from orders] >>> >>> Configuration >>> ============= >>> >>> The SQL statements used for selecting can be configured as follows. >>> >>> Star query >>> ---------- >>> >>> Star queries are the simplest variant of selecting data. They can be used >>> to dump tables into ElasticSearch. >>> >>> curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{ >>> "type" : "jdbc", >>> "jdbc" : { >>> "driver" : "com.mysql.jdbc.Driver", >>> "url" : "jdbc:mysql://localhost:3306/test", >>> "user" : "", >>> "password" : "", >>> "sql" : "select * from orders" >>> } >>> }' >>> >>> For example >>> >>> mysql> select * from orders; >>> +----------+-----------------+---------+----------+--------- >>> ------------+ >>> | customer | department | product | quantity | created >>> | >>> +----------+-----------------+---------+----------+--------- >>> ------------+ >>> | Big | American Fruits | Apples | 1 | 0000-00-00 >>> 00:00:00 | >>> | Large | German Fruits | Bananas | 1 | 0000-00-00 00:00:00 | >>> | Huge | German Fruits | Oranges | 2 | 0000-00-00 >>> 00:00:00 | >>> | Good | German Fruits | Apples | 2 | 2012-06-01 00:00:00 | >>> | Bad | English Fruits | Oranges | 3 | 2012-06-01 >>> 00:00:00 | >>> +----------+-----------------+---------+----------+--------- >>> ------------+ >>> 5 rows in set (0.00 sec) >>> >>> The JSON objects are flat, the `id` >>> of the documents is generated automatically, it is the row number. >>> >>> id=0 {"product":"Apples","created":null,"department":"American >>> Fruits","quantity":1,"customer":"Big"} >>> id=1 {"product":"Bananas","created":null,"department":"German >>> Fruits","quantity":1,"customer":"Large"} >>> id=2 {"product":"Oranges","created":null,"department":"German >>> Fruits","quantity":2,"customer":"Huge"} >>> id=3 {"product":"Apples","created":1338501600000,"department":"German >>> Fruits","quantity":2,"customer":"Good"} >>> id=4 {"product":"Oranges","created":1338501600000,"department":"English >>> Fruits","quantity":3,"customer":"Bad"} >>> >>> Labeled columns >>> --------------- >>> >>> In SQL, each column may be labeled with a name. This name is used by the >>> JDBC river to JSON object construction. >>> >>> curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{ >>> "type" : "jdbc", >>> "jdbc" : { >>> "driver" : "com.mysql.jdbc.Driver", >>> "url" : "jdbc:mysql://localhost:3306/test", >>> "user" : "", >>> "password" : "", >>> "sql" : "select products.name as \"product.name\", >>> orders.customer as \"product.customer.name\", orders.quantity * >>> products.price as \"product.customer.bill\" from products, orders where >>> products.name = orders.product" >>> } >>> }' >>> >>> In this query, the columns selected are described as `product.name`, >>> `product.customer.name`, and `product.customer.bill`. >>> >>> mysql> select products.name as "product.name", orders.customer as >>> "product.customer", orders.quantity * products.price as >>> "product.customer.bill" from products, orders where products.name = >>> orders.product ; >>> +--------------+------------------+-----------------------+ >>> | product.name | product.customer | product.customer.bill | >>> +--------------+------------------+-----------------------+ >>> | Apples | Big | 1 | >>> | Bananas | Large | 2 | >>> | Oranges | Huge | 6 | >>> | Apples | Good | 2 | >>> | Oranges | Bad | 9 | >>> +--------------+------------------+-----------------------+ >>> 5 rows in set, 5 warnings (0.00 sec) >>> >>> The JSON objects are >>> >>> id=0 {"product":{"name":"Apples","customer":{"bill":1.0,"name":"Big"}}} >>> id=1 {"product":{"name":"Bananas","customer":{"bill":2.0,"name":" >>> Large"}}} >>> id=2 {"product":{"name":"Oranges","customer":{"bill":6.0,"name":" >>> Huge"}}} >>> id=3 {"product":{"name":"Apples","customer":{"bill":2.0,"name":" >>> Good"}}} >>> id=4 {"product":{"name":"Oranges","customer":{"bill":9.0,"name":"Bad"}}} >>> >>> There are three column labels with an underscore as prefix >>> that are mapped to the Elasticsearch index/type/id. >>> >>> _id >>> _type >>> _index >>> >>> Structured objects >>> ------------------ >>> >>> One of the advantage of SQL queries is the join operation. From many >>> tables, new tuples can be formed. >>> >>> curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{ >>> "type" : "jdbc", >>> "jdbc" : { >>> "driver" : "com.mysql.jdbc.Driver", >>> "url" : "jdbc:mysql://localhost:3306/test", >>> "user" : "", >>> "password" : "", >>> "sql" : "select \"relations\" as \"_index\", orders.customer as >>> \"_id\", orders.customer as \"contact.customer\", employees.name as >>> \"contact.employee\" from orders left join employees on >>> employees.department = orders.department" >>> } >>> }' >>> >>> For example, these rows from SQL >>> >>> mysql> select "relations" as "_index", orders.customer as "_id", >>> orders.customer as "contact.customer", employees.name as >>> "contact.employee" from orders left join employees on employees.department >>> = orders.department; >>> +-----------+-------+------------------+------------------+ >>> | _index | _id | contact.customer | contact.employee | >>> +-----------+-------+------------------+------------------+ >>> | relations | Big | Big | Smith | >>> | relations | Large | Large | Müller | >>> | relations | Large | Large | Meier | >>> | relations | Large | Large | Schulze | >>> | relations | Huge | Huge | Müller | >>> | relations | Huge | Huge | Meier | >>> | relations | Huge | Huge | Schulze | >>> | relations | Good | Good | Müller | >>> | relations | Good | Good | Meier | >>> | relations | Good | Good | Schulze | >>> | relations | Bad | Bad | Jones | >>> +-----------+-------+------------------+------------------+ >>> 11 rows in set (0.00 sec) >>> >>> will generate fewer JSON objects for the index `relations`. >>> >>> index=relations id=Big {"contact":{"employee":"Smith","customer":"Big"}} >>> index=relations id=Large {"contact":{"employee":[" >>> Müller","Meier","Schulze"],"customer":"Large"}} >>> index=relations id=Huge {"contact":{"employee":[" >>> Müller","Meier","Schulze"],"customer":"Huge"}} >>> index=relations id=Good {"contact":{"employee":[" >>> Müller","Meier","Schulze"],"customer":"Good"}} >>> index=relations id=Bad {"contact":{"employee":"Jones" >>> ,"customer":"Bad"}} >>> >>> Note how the `employee` column is collapsed into a JSON array. The >>> repeated occurence of the `_id` column >>> controls how values are folded into arrays for making use of the >>> ElasticSearch JSON data model. >>> >>> Bind parameter >>> -------------- >>> >>> Bind parameters are useful for selecting rows according to a matching >>> condition >>> where the match criteria is not known beforehand. >>> >>> For example, only rows matching certain conditions can be indexed into >>> ElasticSearch. >>> >>> curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{ >>> "type" : "jdbc", >>> "jdbc" : { >>> "driver" : "com.mysql.jdbc.Driver", >>> "url" : "jdbc:mysql://localhost:3306/test", >>> "user" : "", >>> "password" : "", >>> "sql" : "select products.name as \"product.name\", >>> orders.customer as \"product.customer.name\", orders.quantity * >>> products.price as \"product.customer.bill\" from products, orders where >>> products.name = orders.product and orders.quantity * products.price > >>> ?", >>> "params: [ 5.0 ] >>> } >>> }' >>> >>> Example result >>> >>> id=0 {"product":{"name":"Oranges","customer":{"bill":6.0,"name":" >>> Huge"}}} >>> id=1 {"product":{"name":"Oranges","customer":{"bill":9.0,"name":" >>> Bad"}}} >>> >>> >>> Time-based selecting >>> -------------------- >>> >>> Because the JDBC river is running repeatedly, time-based selecting is >>> useful. >>> The current time is represented by the parameter value `$now`. >>> >>> In this example, all rows beginning with a certain date up to now are >>> selected. >>> >>> curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{ >>> "type" : "jdbc", >>> "jdbc" : { >>> "driver" : "com.mysql.jdbc.Driver", >>> "url" : "jdbc:mysql://localhost:3306/test", >>> "user" : "", >>> "password" : "", >>> "sql" : "select products.name as \"product.name\", >>> orders.customer as \"product.customer.name\", orders.quantity * >>> products.price as \"product.customer.bill\" from products, orders where >>> products.name = orders.product and orders.created between ? - 14 and ?", >>> "params: [ 2012-06-01", "$now" ] >>> } >>> }' >>> >>> Example result: >>> >>> id=0 {"product":{"name":"Apples","customer":{"bill":2.0,"name":"Good"}}} >>> id=1 {"product":{"name":"Oranges","customer":{"bill":9.0,"name":" >>> Bad"}}} >>> >>> Index >>> ----- >>> >>> Each river can index into a specified index. Example: >>> >>> curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{ >>> "type" : "jdbc", >>> "jdbc" : { >>> "driver" : "com.mysql.jdbc.Driver", >>> "url" : "jdbc:mysql://localhost:3306/test", >>> "user" : "", >>> "password" : "", >>> "sql" : "select * from orders", >>> }, >>> "index" : { >>> "index" : "jdbc", >>> "type" : "jdbc" >>> } >>> }' >>> >>> Bulk indexing >>> ------------- >>> >>> Bulk indexing is automatically used in order to speed up the indexing >>> process. >>> >>> Each SQL result set will be indexed by a single bulk if the bulk size is >>> not specified. >>> >>> A bulk size can be defined, also a maximum size of active bulk requests >>> to cope with high load situations. >>> A bulk timeout defines the time period after which bulk feeds continue. >>> >>> curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{ >>> "type" : "jdbc", >>> "jdbc" : { >>> "driver" : "com.mysql.jdbc.Driver", >>> "url" : "jdbc:mysql://localhost:3306/test", >>> "user" : "", >>> "password" : "", >>> "sql" : "select * from orders", >>> }, >>> "index" : { >>> "index" : "jdbc", >>> "type" : "jdbc", >>> "bulk_size" : 100, >>> "max_bulk_requests" : 30, >>> "bulk_timeout" : "60s" >>> } >>> }' >>> >>> >>> Stopping/deleting the river >>> --------------------------- >>> >>> curl -XDELETE 'localhost:9200/_river/my_jdbc_river/' >>> >>> >>> Best regards,, >>> >>> Jörg >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/17dc1eed-3a7c-4b7c-992d-bd0c65b2e3ff%40googlegroups.com >> <https://groups.google.com/d/msgid/elasticsearch/17dc1eed-3a7c-4b7c-992d-bd0c65b2e3ff%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to a topic in the > Google Groups "elasticsearch" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/elasticsearch/TonBKhpdjsA/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHRZw-sFaPrR6VmJqNDFUtRMTYuKHA6h1_sukWZsZWWMA%40mail.gmail.com > <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHRZw-sFaPrR6VmJqNDFUtRMTYuKHA6h1_sukWZsZWWMA%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAG%2Bs7e3arvL%2B9p3AM9FeT0fTHh7P_-_m1wZwzLQaMk76ex1_Ow%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
