I have confirmed with both elasticsearch hive and easticsearcg mr, If both
below situation happens, , EsOutFormat produces invalid header for bulk
indexing.
1. es.resouce contains data to be extracted from doucment
2. es.mapping.id set to be one of field sin document
I looked at the code and invalid header json. It is missing a "," between
"_index": "???", "_type":"???" and rest of interval field. I believe the
following code inside AbstractBulkFactory.java is responsible. I am using
elasticsearch hadoop 2.0
protected void writeBeforeObject(List<Object> pieces) { startHeader(pieces);
index(pieces); id(pieces); parent(pieces); routing(pieces); ttl(pieces);
version(pieces); timestamp(pieces); otherHeader(pieces); endHeader(pieces);
scriptParams(pieces); }
Thanks,
Jack
Jinyuan (Jack) Zhou
On Tue, Jun 17, 2014 at 6:25 AM, Costin Leau <[email protected]> wrote:
> Most likely the some of your data contains some invalid entries which
> result in an invalid JSON payload being sent to ES.
> Check your ID values and/or keep an eye on issue #217 which aims to
> provide more human-friendly messages for the user.
>
> Cheers.
>
> https://github.com/elasticsearch/elasticsearch-hadoop/issues/217
>
> On 6/17/14 2:42 AM, Jinyuan Zhou wrote:
>
>> sure, I was able to run follwoing command against my remote es cluster.
>> hive -i init.hive -f search.hql.
>>
>> Below is the contents of init.hive, search.hql and data file in hdfs
>> /user/cloudera/hivework/foobar/foobar.data
>>
>> I replaced value for es.nodes with fake name. Other than that, it should
>> ran without problem. I am using feature called
>> 'dynamic/mult resource wirtes. It works in this example, but when I also
>> add 'es.mapping.id <http://es.mapping.id>' =
>> 'id' setting. I got a the following error:
>> /
>> Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest:
>> Unexpected character ('"' (code 34)): was expecting
>> comma to separate OBJECT entries
>> at [Source: [B@7be1d686; line: 1, column: 53]
>> at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.
>> java:300)
>> at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.
>> java:278)/
>>
>>
>>
>> -----init.hive----
>>
>> set es.nodes=my.remote.escluster;
>> set es.port=9200;
>> set es.index.auto.create=yes;
>> set hive.cli.print.current.db=true;
>> set hive.exec.mode.local.auto=true;
>> set mapred.map.tasks.speculative.execution=false;
>> set mapred.reduce.tasks.speculative.execution=false;
>> set hive.mapred.reduce.tasks.speculative.execution=false;
>> add jar /home/cloudera/elasticsearch-hadoop-2.0.0/dist/
>> elasticsearch-hadoop-hive-2.0.0.jar;
>>
>> -----search.hql----
>>
>> use search;
>> DROP TABLE IF EXISTS foo;
>> CREATE EXTERNAL TABLE foo (id STRING, bar STRING, bar_type STRING)
>> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
>> LOCATION '/user/cloudera/hivework/foobar';
>> select * from foo;
>> DROP TABLE IF EXISTS es_foo;
>> CREATE EXTERNAL TABLE es_foo (id STRING, bar STRING, bar_type STRING)
>> STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
>> TBLPROPERTIES('es.resource' = 'foo_index/{bar_type}');
>>
>> INSERT OVERWRITE TABLE es_foo SELECT * FROM foo;
>>
>> ----- /user/cloudera/hivework/foobar/foobar.data ---
>>
>> 1, bar1, first_bar
>> 2, bar2, first_bar
>> 3, foo_bar_1, second_bar
>> 4, foo_bar_12, second_bar
>> ~
>>
>>
>>
>>
>> Jinyuan (Jack) Zhou
>>
>>
>> On Mon, Jun 16, 2014 at 2:06 PM, Costin Leau <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Thanks for sharing - can you also give an example of the table
>> initialization in init.hive vs myscript.hql?
>>
>> Cheers!
>>
>>
>> On 6/16/14 11:19 PM, Jinyuan Zhou wrote:
>>
>> Just share a solution I learned hive side.
>>
>> hive cli has an -i option that takes a file of hive commands to
>> initilize the session.
>> so I can put a list of set comand as well as add jar ... command
>> in one file, say inithive
>> then run the cli as this: hive -i init.hive -f myscript.hql.
>> Note table creation hql inside myscript.hql don't
>> have to
>> set es.* properties as long as it appears in init.hive file This
>> solves my problem.
>> Thanks,
>>
>>
>> Jinyuan (Jack) Zhou
>>
>>
>> On Sun, Jun 15, 2014 at 10:24 AM, Jinyuan Zhou <
>> [email protected] <mailto:[email protected]>
>> <mailto:[email protected] <mailto:[email protected]>__>>
>> wrote:
>>
>> Thanks Costin,
>> I am aiming at modifying the existing hadoop cluster and
>> hive installation and also modularizing some
>> common es.*
>> properies in a separate common place. I know the first goal
>> can be achieved with hive cli --auxpath
>> option and
>> hive table's TBLPROPERTERTIES. For the secon goal, I am able
>> to move some es.* settings from TBLPROPERTIES
>> declaration to hive's set statments. For example, I can put
>>
>> set es.nodes=my.domain.com <http://my.domain.com> <
>> http://my.domain.com>
>>
>>
>> in the same hql file then skip es.nodes setting in
>> TBLPROPERTIES in the external table delcarations in the
>> SAME
>> hql. But I wish I can move the set statetemnt in a separate
>> file. I now realize this is rather a hive
>> question.
>> Regards,
>> Jack
>>
>>
>> On Sun, Jun 15, 2014 at 2:19 AM, Costin Leau <
>> [email protected] <mailto:[email protected]>
>> <mailto:[email protected] <mailto:[email protected]>>__>
>> wrote:
>>
>> Could you please raise an issue with some type of
>> example? Due to the way Hadoop (and Hive) works,
>> things tend to be tricky in terms of configuring a job.
>>
>> The configuration needs to be created before a job is
>> submitted which in practice means "dynamic
>> configurations"
>> are basically impossible (this also has some security
>> implications which are simply avoided this way).
>> Thus either one specifies the configuration manually or
>> loads a known location file (hive-site.xml,
>> core-site.xml...)
>> upfront, before the job is submitted.
>> This means when dealing with Hive, Pig, Cascading,
>> etc... unless one adds a pre-processor to the job
>> content
>> (script, flow, etc...)
>> by the time es-hadoop kicks in, the job is already
>> running and thus its changes discarded.
>>
>> Cheers,
>>
>> On 6/14/14 1:57 AM, Jinyuan Zhou wrote:
>>
>> Hi,
>> I am playing with elasticsearch and hive
>> integration. The documentation says
>> to set configuration like es.nodes, es.port in
>> TBLPROPERTIES. It works.
>> But it can cause many reduntant codes. If I have ten
>> data set to index to the same es cluster,
>> I would have to repeat this information ten times
>> in TBLPROPERTIES. Even if
>> I use var substitution I still have to rwrite
>> this subtititiov var for each table definition.
>> What I am looking for is to put these info in say
>> one file and pass the location, in some way, to
>> hive cli
>> so hive elasticsearch will get these settings when
>> trying to find es server to talk to.
>> I am not looking into put these info into files
>> like hive-site.xml.
>>
>> Thanks,
>>
>> Jack
>>
>> --
>> You received this message because you are subscribed
>> to the Google Groups "elasticsearch" group.
>> To unsubscribe from this group and stop receiving
>> emails from it, send an email to
>> elasticsearch+unsubscribe@__go__oglegroups.com <
>> http://googlegroups.com>
>> <mailto:elasticsearch%[email protected] <mailto:
>> elasticsearch%[email protected]>__>
>> <mailto:elasticsearch+____
>> [email protected]
>> <mailto:elasticsearch%[email protected]> <mailto:
>> elasticsearch%[email protected]
>> <mailto:elasticsearch%[email protected]>__>>.
>>
>> To view this discussion on the web visit
>> https://groups.google.com/d/____msgid/elasticsearch/
>> 7040c805-____e845-4b3d-a9fe-5e18d8445f7f%____40googlegroups.com
>> <https://groups.google.com/d/__msgid/elasticsearch/7040c805-
>> __e845-4b3d-a9fe-5e18d8445f7f%__40googlegroups.com>
>> <https://groups.google.com/d/__msgid/elasticsearch/7040c805-
>> __e845-4b3d-a9fe-5e18d8445f7f%__40googlegroups.com
>> <https://groups.google.com/d/msgid/elasticsearch/7040c805-
>> e845-4b3d-a9fe-5e18d8445f7f%40googlegroups.com>>
>>
>> <https://groups.google.com/d/____msgid/elasticsearch/
>> 7040c805-____e845-4b3d-a9fe-5e18d8445f7f%____40googlegroups.com?utm___
>> medium=__email&utm_source=__footer
>> <https://groups.google.com/d/__msgid/elasticsearch/7040c805-
>> __e845-4b3d-a9fe-5e18d8445f7f%__40googlegroups.com?utm_
>> medium=__email&utm_source=footer>
>>
>> <https://groups.google.com/d/__msgid/elasticsearch/7040c805-
>> __e845-4b3d-a9fe-5e18d8445f7f%__40googlegroups.com?utm_
>> medium=__email&utm_source=footer
>> <https://groups.google.com/d/msgid/elasticsearch/7040c805-
>> e845-4b3d-a9fe-5e18d8445f7f%40googlegroups.com?utm_medium=
>> email&utm_source=footer>>>.
>> For more options, visit
>> https://groups.google.com/d/____optout
>> <https://groups.google.com/d/__optout> <
>> https://groups.google.com/d/__optout <https://groups.google.com/d/optout
>> >>.
>>
>>
>>
>> --
>> Costin
>>
>> --
>> You received this message because you are subscribed to
>> a topic in the Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/____topic/elasticsearch/____
>> 1WH7kOD3uKs/unsubscribe
>> <https://groups.google.com/d/__topic/elasticsearch/__
>> 1WH7kOD3uKs/unsubscribe>
>> <https://groups.google.com/d/__topic/elasticsearch/__
>> 1WH7kOD3uKs/unsubscribe
>> <https://groups.google.com/d/topic/elasticsearch/
>> 1WH7kOD3uKs/unsubscribe>>.
>> To unsubscribe from this group and all its topics, send
>> an email to
>> elasticsearch+unsubscribe@__go__oglegroups.com <
>> http://googlegroups.com>
>> <mailto:elasticsearch%[email protected]
>> <mailto:elasticsearch%[email protected]>__>.
>>
>> To view this discussion on the web visit
>> https://groups.google.com/d/____msgid/elasticsearch/
>> 539D6507.____3080207%40gmail.com
>> <https://groups.google.com/d/__msgid/elasticsearch/539D6507.
>> __3080207%40gmail.com>
>> <https://groups.google.com/d/_
>> _msgid/elasticsearch/539D6507.__3080207%40gmail.com
>> <https://groups.google.com/d/msgid/elasticsearch/539D6507.
>> 3080207%40gmail.com>>.
>> For more options, visit https://groups.google.com/d/__
>> __optout <https://groups.google.com/d/__optout>
>> <https://groups.google.com/d/__optout <
>> https://groups.google.com/d/optout>>.
>>
>>
>>
>>
>>
>> --
>> -- Jinyuan (Jack) Zhou
>>
>>
>> --
>> You received this message because you are subscribed to the
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it,
>> send an email to
>> elasticsearch+unsubscribe@__googlegroups.com <mailto:
>> elasticsearch%[email protected]>
>> <mailto:[email protected] <mailto:
>> elasticsearch%[email protected]>>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/__msgid/elasticsearch/__
>> CANBTPCErh1M5_xNa0SE-__ZShpUDuXKTPMCYqrWCB1z36%__
>> 3D9vjaDQ%40mail.gmail.com
>> <https://groups.google.com/d/msgid/elasticsearch/
>> CANBTPCErh1M5_xNa0SE-ZShpUDuXKTPMCYqrWCB1z36%3D9vjaDQ%40mail.gmail.com>
>> <https://groups.google.com/d/__msgid/elasticsearch/__
>> CANBTPCErh1M5_xNa0SE-__ZShpUDuXKTPMCYqrWCB1z36%__
>> 3D9vjaDQ%40mail.gmail.com?utm___medium=email&utm_source=footer
>> <https://groups.google.com/d/msgid/elasticsearch/
>> CANBTPCErh1M5_xNa0SE-ZShpUDuXKTPMCYqrWCB1z36%
>> 3D9vjaDQ%40mail.gmail.com?utm_medium=email&utm_source=footer>__>.
>>
>> For more options, visit https://groups.google.com/d/__optout <
>> https://groups.google.com/d/optout>.
>>
>>
>> --
>> Costin
>>
>> --
>> You received this message because you are subscribed to a topic in
>> the Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit https://groups.google.com/d/__
>> topic/elasticsearch/__1WH7kOD3uKs/unsubscribe
>> <https://groups.google.com/d/topic/elasticsearch/
>> 1WH7kOD3uKs/unsubscribe>.
>> To unsubscribe from this group and all its topics, send an email to
>> elasticsearch+unsubscribe@__googlegroups.com
>> <mailto:elasticsearch%[email protected]>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/__msgid/elasticsearch/539F5C5F._
>> _5050408%40gmail.com
>> <https://groups.google.com/d/msgid/elasticsearch/539F5C5F.
>> 5050408%40gmail.com>.
>>
>> For more options, visit https://groups.google.com/d/__optout <
>> https://groups.google.com/d/optout>.
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to
>> [email protected] <mailto:elasticsearch+
>> [email protected]>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/
>> CANBTPCGhqWTJLAWNKmnkMTOWGFizi4wShfvo7V0u0_5HDniDkg%40mail.gmail.com
>> <https://groups.google.com/d/msgid/elasticsearch/
>> CANBTPCGhqWTJLAWNKmnkMTOWGFizi4wShfvo7V0u0_5HDniDkg%40mail.
>> gmail.com?utm_medium=email&utm_source=footer>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> Costin
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/elasticsearch/1WH7kOD3uKs/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/elasticsearch/53A041B6.3010203%40gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.
>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CANBTPCHuJ3CwVMiB-2GFC790st3_CVkmzA5kHd2u%2Bsmax1Z9fw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.