There are no risks, but it will be slower especially when the list
after in is very long.
Zheng
2010/8/3 我很快乐 896923...@qq.com:
Thank you for your reply.
Because my company reuire we use 0.4.1 version, so I could't upgrade the
version. Could you tell me there are which risks if I use the OR
Just change FetchTask.java: public boolean fetch(ArrayListString res)
res.add(((Text) mSerde.serialize(io.o, io.oi)).toString());
Instead of using Text.toString(), use your own method to convert from
raw bytes to unicode String.
Zheng
On Sun, Aug 1, 2010 at 8:31 PM, bc Wong
No, but it's very simple to write one.
public class MyUTF8StringChecker extends UDF {
public boolean evaluate(Text t) {
try {
Text.validateUTF8(t.getBytes(), 0, t.getLength());
return true;
} catch (MalformedInputException e) {
return false;
}
}
}
On Tue,
If you just need to scan the data once, it makes sense to use hive
SerDe to read the data directly (which saves you one I/O round trip).
If you need to read the data multiple times, then it's better to save
the 3 columns into separate files.
Zheng
On Mon, Jul 12, 2010 at 5:08 PM, Leo Alekseyev
Yes. Even a normal (non-generic) UDF might work if all columns can be
converted to the same type. UDF can accept variable-length of
arguments of the same type.
it will be a great addition to let UDF/UDAF handle * (as well as `regex`).
The change is all compile-time, and is relatively simple.
here, no?
https://issues.apache.org/jira/browse/HIVE-302
Then what did you fix?
-- amr
On 6/10/2010 10:22 PM, Zheng Shao wrote:
Also, changing LINES TERMINATED BY probably won't work, because
hadoop's TextInputFormat does not allow line terminators other than
\n.
Zheng
On Thu, Jun 10, 2010 at 6
Also, changing LINES TERMINATED BY probably won't work, because
hadoop's TextInputFormat does not allow line terminators other than
\n.
Zheng
On Thu, Jun 10, 2010 at 6:31 PM, Carl Steinbach c...@cloudera.com wrote:
Hi Shuja,
The grammar for Hive's CREATE TABLE statement is discussed
here:
, if I change the alias of subquery 't1' (either the inner one or the
join result), the bug disappears. I'm wondering if there is possible that
table aliases of different level will conflict when their alias names are
the same.
2010/5/12 Zheng Shao zsh...@gmail.com
Yes that does seem
Do you need to get all records in the order? In most of our use cases
users are only interested in the top 100 or something. If you do limit
100 together with order by, it will be much faster.
Sent from my iPhone
On May 12, 2010, at 1:54 PM, luocan19826...@sohu.com wrote:
Thanks, Ted.
If
Put t1.objt2.obj in the where clause.
On Fri, Apr 30, 2010 at 12:14 AM, Harshit Kumar ku...@bike.snu.ac.kr wrote:
Hi
I have a query like this
from spo t1 join spo t2 on (t1.sub=t2.sub and t1.objt2.obj) insert
overwrite table spojoin select t1.sub, t1.pre, t2.obj, t2.sub, t2.pre,
t2.obj;
. the pig script makes 3 mapper for M/R job.
What should I check further? Job config info?
- Youngwoo
2010/4/22 Zheng Shao zsh...@gmail.com
It should be automatically supported. You don't need to do anything
except adding the bzip2 codec in io.compression.codecs in hadoop
configuration
It should be automatically supported. You don't need to do anything
except adding the bzip2 codec in io.compression.codecs in hadoop
configuration files (core-site.xml)
Zheng
On Wed, Apr 21, 2010 at 10:15 PM, 김영우 warwit...@gmail.com wrote:
Hi,
HADOOP-4012,
Its as simple as taking a hashcode of the key and mod by number of
reducers. To get started, have a try of any .q files in clientpositive
directory.
On the code side, HiveKey.java has the implementation.
Sent from my iPhone
On Apr 11, 2010, at 2:48 PM, Aaron McCurry amccu...@gmail.com
6, 2010 at 3:12 PM, Zheng Shao zsh...@gmail.com wrote:
Are you using Java 1.5? Hive now requires Java 1.6
On Tue, Apr 6, 2010 at 7:23 AM, Aaron McCurry amccu...@gmail.com wrote:
In the past I have used hive 0.3.0 successfully and now with a new
project
coming up I decided to give hive
That change should be fine.
Zheng
On Tue, Apr 6, 2010 at 5:16 PM, Dilip Joseph
dilip.antony.jos...@gmail.com wrote:
Hello,
I got the following error when creating a table with a column that has
an ARRAY of STRUCTS with many fields. It appears that there is a 128
character limit on the
See http://wiki.apache.org/hadoop/Hive/AdminManual/MetastoreAdmin for details.
Zheng
On Mon, Apr 5, 2010 at 12:01 AM, Sagar Naik sn...@attributor.com wrote:
Hi
As a trial, I am trying to setup hive for local DFS,MR mode
I have set
property
namehive.metastore.uris/name
Hive 0.4 has limited support on complex types in UDAF.
If you are looking for an ad-hoc solution, try putting the data into a
single Text.
It will be great if you can ask AWS guys upgrading Hive to 0.5.
0.5 has over 100 bug fixes and is much more stable.
Zheng
On Fri, Apr 2, 2010 at 1:11 PM,
The easiest way is to write a SequenceFileInputFormat that returns a
RecordReader that has key in the value and value in the key.
Zheng
On Fri, Apr 2, 2010 at 2:16 PM, Edward Capriolo edlinuxg...@gmail.com wrote:
I have some sequence files in which all our data is in the key.
I will take a look. Thanks Bryan!
On Thu, Apr 1, 2010 at 12:38 AM, Bryan Talbot btal...@aeriagames.com wrote:
I guess most places are running their clusters with UTC time zones or these
functions are not widely used.
Any chance of getting a committer to look at the patch with unit tests?
Setting TZ in your .bash_profile won't work because the map/reduce tasks
runs on the hadoop clusters.
If you start your hadoop tasktracker with that TZ setting, it will probably
work.
Zheng
On Thu, Apr 1, 2010 at 3:32 PM, tom kersnick hiveu...@gmail.com wrote:
So its working, but Im having a
Hive also loads hadoop conf in HADOOP_HOME/conf. You can set it there.
On 3/23/10, Ryan LeCompte lecom...@gmail.com wrote:
Right now when we submit queries, it uses the hadoop scheduler. I have a
custom fair share scheduler configured as well, but I see that jobs
generated from our Hive
Glad to know that Hive has a good performance compared with other languages.
It will be great if you can publish the queries/codes in the
benchmark, as well as environment setup, so that other people can
rerun your benchmark easily.
Zheng
On Tue, Mar 23, 2010 at 7:11 AM, Rob Stewart
From 0.5 (probably), we can add type information to the column names after
AS.
Note that the first level separator should be TAB, and the second
separator should be ^B (and then ^C, etc)
FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
INSERT OVERWRITE TABLE SS
REDUCE *
to userid.
Dilip
On Mon, Mar 22, 2010 at 2:20 PM, Zheng Shao zsh...@gmail.com wrote:
From 0.5 (probably), we can add type information to the column names after
AS.
Note that the first level separator should be TAB, and the second
separator should be ^B (and then ^C, etc)
FROM (select * from
BinarySortableSerDe, LazySimpleSerDe, and LazyBinarySerDe all supports
arrays/structs.
There is a UDF called size(var) that can return the size of an array.
Zheng
On Sun, Mar 21, 2010 at 9:19 PM, Adam O'Donnell a...@immunet.com wrote:
First of all, thank you to all of the facebook guys for
Multiple-level of delimiters works as the following by default:
The first level (fields delimiters) will be \001 (^A, ascii code 1).
Each level of struct and array take an additional field delimitor
following (\002, etc). Each level of map takes 2 levels of additional
field deimitor.
So it will
What is the format of your data?
TBinaryProtocol does not work with TextFile format, as you can imagine.
On 3/10/10, Anty anty@gmail.com wrote:
Hi: ALL
I encounter a problem, any suggestion will be appreciated!
MY hive version is 0.30.0
I create a table in CLI.
CREATE TABLE table2
Try Double[]. Primitive arrays (like double[], int[]) are not
supported yet, because that needs special handling for each of the
primitive type.
Zheng
On Wed, Mar 10, 2010 at 4:55 PM, tom kersnick hiveu...@gmail.com wrote:
Gents,
Any ideas why this happens? Im using hive 0.50 with hadoop
WHERE product_name IS NOT NULL AND product_name ''
On Tue, Mar 9, 2010 at 12:45 AM, prakash sejwani
prakashsejw...@gmail.com wrote:
yes right can you give me a tip how to exclude blank values
On Tue, Mar 9, 2010 at 2:13 PM, Zheng Shao zsh...@gmail.com wrote:
So I guess you didn't exclude
Do you want to try hive release 0.5.0 or hive trunk?
We should have provided better error messages here:
https://issues.apache.org/jira/browse/HIVE-1216
Zheng
On Thu, Mar 4, 2010 at 12:34 PM, Tom Nichols tmnich...@gmail.com wrote:
I am trying out Hive, using Cloudera's EC2 distribution (Hadoop
there is an extra , before FROM
cast(regexp_extract(resource, '/companies/(\\d+)', 1) AS INT)
AS company_id,
-- Run our User Defined Function (see
src/com/econify/geoip/IpToCountry.java). Takes the IP of the hit and
looks up its country
-- ip_to_country(ip) AS ip_country
.
If you'd like to network with fellow Hive/Hadoop users online, feel
free to find them here:
http://www.facebook.com/event.php?eid=319237846974
Zheng
On Fri, Feb 26, 2010 at 1:56 PM, Zheng Shao zsh...@gmail.com wrote:
Hi all,
We are going to hold the second Hive User Group Meeting at 7PM on
3/18
/UserGroupInformation.html
shows such a constructor.
Now, my question is: is this something that can be fixed by shims? Or it is a
problem with hadoop?
-Original Message-
From: Zheng Shao [mailto:zsh...@gmail.com]
Sent: Saturday, February 27, 2010 4:24 AM
To: hive-user@hadoop.apache.org
Hi Mazar,
We have not tried Hive on Hadoop higher than 0.20 yet.
However, Hive has the shim infrastructure which makes it easy to port
to new Hadoop versions.
Please see the shim directory inside Hive.
Zheng
On Fri, Feb 26, 2010 at 1:59 PM, Massoud Mazar massoud.ma...@avg.com wrote:
Is it
Hi all,
We are going to hold the second Hive User Group Meeting at 7PM on
3/18/2010 Thursday.
The agenda will be:
* Hive Tutorial: 20 min
* Hive User Case Study: 20 min
* New Features and API: 25 min
JDBC/ODBC and CTAS
UDF/UDAF/UDTF
Create View/HBaseInputFormat
Hive Join Strategy
SerDe
Since Hive runs many mappers/reducers in parallel, there is no way to
generate a globally unique increasing row id.
If you are OK with that, you can easily write a non-deterministic
UDF. See rand() (or UDFRand.java) for example.
Please open a JIRA if you plan to work on that.
Zheng
On Wed, Feb
Most probably $TMPDIR does not exist.
I think by default it's /tmp/user. Can you mkdir ?
On Thu, Feb 25, 2010 at 5:58 AM, Aryeh Berkowitz ar...@iswcorp.com wrote:
Can anybody tell me why I’m getting this error?
hive show tables;
OK
email
html_href
html_src
ipadrr
could use mapred.task.id to get a unique string.
-Todd
On Thu, Feb 25, 2010 at 12:42 AM, Zheng Shao zsh...@gmail.com wrote:
Since Hive runs many mappers/reducers in parallel, there is no way to
generate a globally unique increasing row id.
If you are OK with that, you can easily write a non
Hi folks,
We have released Hive 0.5.0.
You can find it from the download page in 24 hours (still waiting to
be mirrored)
http://hadoop.apache.org/hive/releases.html#Download
--
Yours,
Zheng
?
-Original Message-
From: Zheng Shao [mailto:zsh...@gmail.com]
Sent: Wednesday, February 24, 2010 3:34 AM
To: hive-user@hadoop.apache.org; hive-...@hadoop.apache.org
Subject: [ANNOUNCE] Hive 0.5.0 released
Hi folks,
We have released Hive 0.5.0.
You can find it from the download
, Ryan LeCompte lecom...@gmail.com wrote:
Ah, interesting.
Using Hadoop 0.20.1. Is this the problematic version?
Thanks,
Ryan
On Wed, Feb 24, 2010 at 12:50 PM, Zheng Shao zsh...@gmail.com wrote:
Thanks for the feedback.
Which exact version of hadoop are you using?
There is a bug
export
HADOOP_CLASSPATH=/master/hadoop/json.jar:/master/hadoop/hbase-0.20.2/hbase-0.20.2.jar:/master/hadoop/hbase-0.20.2/lib/zookeeper-3.2.1.jar:/master/hadoop/hive/build/dist/lib/:/master/hadoop/hive/build/dist/lib/*.jar:/master/hadoop/hive/build/dist/conf/
should be:
export
Hi,
I just made a release candidate at
https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.5.0-rc1
The tarballs are at: http://people.apache.org/~zshao/hive-0.5.0-candidate-1/
The HWI startup problem is fixed in rc1. This supersedes the previous
email about voting on rc0.
Please vote.
Can you generate a patch for 0.5? The patch does not work on branch-0.5
Zheng
On 2/19/10, Edward Capriolo edlinuxg...@gmail.com wrote:
On Fri, Feb 19, 2010 at 9:49 PM, Zheng Shao zsh...@gmail.com wrote:
Hi,
I just made a release candidate at
https://svn.apache.org/repos/asf/hadoop/hive
is compressed, which ones do I have to set?
Saurabh.
On Fri, Feb 19, 2010 at 12:37 AM, Zheng Shao zsh...@gmail.com wrote:
Did you also:
SET mapred.output.compression.codec=org.apacheGZipCode;
Zheng
On Thu, Feb 18, 2010 at 8:25 AM, Saurabh Nanda saurabhna...@gmail.com
wrote:
Hi
Can you open a JIRA and help propose some concrete design of the change?
That will help make it faster to have this feature.
Thanks,
Zheng
On Fri, Feb 19, 2010 at 6:17 AM, Andy Kent andy.k...@forward.co.uk wrote:
When executing commands on the hive command line it give really useful output
if
Hi Jerome,
Is there any update on this?
https://issues.apache.org/jira/browse/HIVE-259
Zheng
On Fri, Feb 5, 2010 at 9:34 AM, Jerome Boulon jbou...@netflix.com wrote:
Hi Bryan,
I'm working on Hive-259. I'll post an update early next week.
/Jerome.
On 2/4/10 9:08 PM, Bryan Talbot
Jason,
Do you want to open a JIRA and contrib your map_explode function to Hive?
That will be greatly appreciated.
Zheng
On Fri, Feb 19, 2010 at 2:49 PM, Yongqiang He
heyongqi...@software.ict.ac.cn wrote:
Hi Jason,
This is a known bug, see https://issues.apache.org/jira/browse/HIVE-1056
Hi,
I just made a release candidate at
https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.5.0-rc0
The tarballs are at: http://people.apache.org/~zshao/hive-0.4.1-candidate-3/
Please vote.
--
Yours,
Zheng
. Will do some more tests and get back.
Saurabh.
On Mon, Feb 1, 2010 at 1:22 PM, Zheng Shao zsh...@gmail.com wrote:
I would first check whether it is really the block compression or
record compression.
Also maybe the block size is too small but I am not sure that is
tunable in SequenceFile
There is no command to do that right now.
One way to go is to create another external table pointing to the same
location (and forget about the old table).
Or you can move the files first, before dropping and recreating the same table.
Zheng
On Thu, Feb 18, 2010 at 10:22 AM, Eva Tse
https://issues.apache.org/jira/browse/HIVE-917 might be what you want
(suppose both of the tables are already bucketed on the join column).
Zheng
On Thu, Feb 18, 2010 at 2:53 PM, Ning Zhang nzh...@facebook.com wrote:
1GB of the small table is usually too large for map-side joins. If the raw
HIVE-1181 for branch 0.5.
Zheng
-- Forwarded message --
From: Andy Kent andy.k...@forward.co.uk
Date: Thu, Feb 18, 2010 at 3:17 PM
Subject: Re: Hive Server Leaking File Descriptors?
To: hive-user@hadoop.apache.org hive-user@hadoop.apache.org
On 18 Feb 2010, at 20:29, Zheng Shao
The stacktrace that you showed is from the hive cli right?
Did you define HADOOP_CLASSPATH somewhere?
Hive modifies HADOOP_CLASSPATH so it's important to modify it by
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/my/new/path instead of
directly overwriting it.
Zheng
On Thu, Feb 18, 2010 at 9:22
In which directory did you start hive? hive should be started in build/dist
Zheng
On Wed, Feb 17, 2010 at 2:23 AM, Vidyasagar Venkata Nallapati
vidyasagar.nallap...@onmobile.com wrote:
Hi ,
When starting the hive I am getting an error even after I am including in
class path, attached is
I just corrected the wiki page. It will also be a good idea to support
case-insensitive boolean values in the code.
Zheng
On Wed, Feb 17, 2010 at 9:27 AM, Brent Miller brentalanmil...@gmail.com wrote:
Thanks Adam, that works for me as well.
It seems that the property for
I think this is worth exploring. Unit test time is now longer and
longer given more code and more tests.
Do you want to start a JIRA issue and discuss more about it?
Zheng
On Wed, Feb 17, 2010 at 8:53 AM, Edward Capriolo edlinuxg...@gmail.com wrote:
I made an ant target quick-test, which
( like Pig ) hive automatically detects .bz2
extensions and applies appropriate decompression. Am I wrong ?
-Prasen
On Thu, Feb 18, 2010 at 3:04 AM, Zheng Shao zsh...@gmail.com wrote:
I just corrected the wiki page. It will also be a good idea to support
case-insensitive boolean values
and appreciate,
-Prasen
On Thu, Feb 18, 2010 at 11:52 AM, Zheng Shao zsh...@gmail.com wrote:
There is no special setting for bz2.
Can you get the debug log?
Zheng
On Wed, Feb 17, 2010 at 9:02 PM, prasenjit mukherjee
pmukher...@quattrowireless.com wrote:
So I tried the same with .gz
in their hadoop-site.xml. Is there a way I
can pass those parameters from hive, so that I dont need to manually change
the file ?
-Thanks,
Prasen
On Thu, Feb 18, 2010 at 12:54 PM, Zheng Shao zsh...@gmail.com wrote:
Just remember that we need to have the BZipCodec class in the
following hadoop
Hive branch 0.5 was created 5 weeks ago:
https://svn.apache.org/viewvc/hadoop/hive/branches/branch-0.5/
It has also been running as the production version of Hive at Facebook
for 2 weeks.
We'd like to start making release candidates (for 0.5.0) from branch 0.5.
Please vote.
--
Yours,
Zheng
Can you go to that box, sudo as root, and do lsof | grep 12345 where
12345 is the process id of the hive server?
We should be able to see the names of the files that are open.
Zheng
On Mon, Feb 15, 2010 at 7:42 AM, Andy Kent andy.k...@forward.co.uk wrote:
Nope, no luck so far.
We have upped
MySQL is recommended for multiple-node deployment of Hive. Can you try MySQL?
Zheng
On Mon, Feb 8, 2010 at 6:32 PM, Mafish Liu maf...@gmail.com wrote:
Hi, all:
I'm deploying hive from node A to node B. Hive on node A works
properly while on node B, when I try to create a new table, I got the
Hi Roberto,
The reason that Text is passed in is because the table is defined as
TextFile format (the default).
There are some examples (*.q files) of using SequenceFile format (
CREATE TABLE xxx STORED AS SEQUENCEFILE).
SEQUENCEFILE will return BytesWritable by default.
Please have a try.
What commands did you run? With which release?
Zheng
On Wed, Feb 10, 2010 at 11:20 PM, Vidyasagar Venkata Nallapati
vidyasagar.nallap...@onmobile.com wrote:
Hi,
Installation is giving an error as
master/hadoop/hadoop-0.20.1/build.xml:895: 'java5.home' is not defined.
Forrest requires
add file myfile.txt;
You can find some examples in *.q files in the distribution.
Zheng
On Thu, Feb 11, 2010 at 10:23 PM, Adam O'Donnell a...@immunet.com wrote:
Guys:
How do you go about distributing additional files that may be needed
by your reduce scripts? For example, I need to
Another possible reason is that we found sometimes hadoop framework
does not return the correct count to the clients.
In all these cases, the count is smaller than the number of rows
actually loaded.
which version of hadoop are you using?
Zheng
On Mon, Feb 8, 2010 at 11:27 PM, Jeff Hammerbacher
Looks like a lzo codec problem. Can you try a simple mapreduce program
outputs to lzo compression and the same output file format as you hive
table?
On 2/9/10, Bennie Schut bsc...@ebuddy.com wrote:
I have a bit of an edge case on using lzo which I think might be related
to HIVE-524.
When
Yes that's correct. I prefer to download the jars in add jar.
Zheng
On Mon, Feb 8, 2010 at 3:46 PM, Philip Zeyliger phi...@cloudera.com wrote:
Hi folks,
I have a quick question about UDF support in Hive. I'm on the 0.5 branch.
Can you use a UDF where the jar which contains the function is
That seems to be a bug.
Are you using hive trunk or any release?
On 2/5/10, Bennie Schut bsc...@ebuddy.com wrote:
I have a tab separated files I have loaded it with load data inpath
then I do a
SET hive.exec.compress.output=true;
SET
-Original Message-
From: Zheng Shao [mailto:zsh...@gmail.com]
Sent: Friday, February 05, 2010 12:47 PM
To: hive-user@hadoop.apache.org
Subject: Re: Hive Installation Problem
Added to http://wiki.apache.org/hadoop/Hive/FAQ
Zheng
On Thu, Feb 4, 2010 at 11:11 PM, Zheng Shao zsh
We can load data/insert overwrite data concurrently as long as they
are different partitions.
On Thu, Feb 4, 2010 at 6:51 AM, Ryan LeCompte lecom...@gmail.com wrote:
Hey guys,
Is it possible to concurrently load data into Hive tables (same table,
different partition)? I'd like to concurrently
We haven't had a plan yet. It will be great to draw out the pros/cons
of moving to the new MapReduce API.
Do you want to open a JIRA to discuss it?
Zheng
On Thu, Feb 4, 2010 at 5:46 PM, Schubert Zhang zson...@gmail.com wrote:
Does anyone know the plan of Hive to support new Hadoop MapReduce
I would say, just create a histogram of value, count pair, sort at
the end, and return the value at the percentile.
This assumes that the number of unique values are not big, which can
be easily enforced by using round(number, digits).
Zheng
On Thu, Feb 4, 2010 at 9:08 PM, Bryan Talbot
Added to http://wiki.apache.org/hadoop/Hive/FAQ
Zheng
On Thu, Feb 4, 2010 at 11:11 PM, Zheng Shao zsh...@gmail.com wrote:
Try this:
cd ~/.ant/cache/hadoop/core/sources
wget
http://archive.apache.org/dist/hadoop/core/hadoop-0.20.1/hadoop-0.20.1.tar.gz
Zheng
On Thu, Feb 4, 2010 at 10:23
Can you post the Hive query? What are the types of the parameters that
you passed to the function?
Zheng
On Wed, Feb 3, 2010 at 3:23 AM, Sonal Goyal sonalgoy...@gmail.com wrote:
Hi,
I am writing a UDAF which takes in 4 parameters. I have 2 cases - one where
all the paramters are ints, and
See ql/src/test/queries/clientpositive/uniquejoin.q
FROM UNIQUEJOIN PRESERVE T1 a (a.key), PRESERVE T2 b (b.key), PRESERVE
T3 c (c.key)
SELECT a.key, b.key, c.key;
FROM UNIQUEJOIN T1 a (a.key), T2 b (b.key), T3 c (c.key)
SELECT a.key, b.key, c.key;
FROM UNIQUEJOIN T1 a (a.key), T2 b (b.key-1),
https://issues.apache.org/jira/browse/HIVE-591
On Wed, Feb 3, 2010 at 1:34 PM, Zheng Shao zsh...@gmail.com wrote:
See ql/src/test/queries/clientpositive/uniquejoin.q
FROM UNIQUEJOIN PRESERVE T1 a (a.key), PRESERVE T2 b (b.key), PRESERVE
T3 c (c.key)
SELECT a.key, b.key, c.key;
FROM
If the join key is the same, you can use unique join to make sure
it's done in a single map-reduce job.
Zheng
On Wed, Feb 3, 2010 at 1:25 AM, bharath v
bharathvissapragada1...@gmail.com wrote:
Hi ,
I have a small doubt in how hive handles queries containing join of more
than 2 tables .
the function to output something and to verify that Hive specific hooks are
in place. If you have any suggestions, please do let me know.
Thanks and Regards,
Sonal
On Mon, Feb 1, 2010 at 1:19 PM, Zheng Shao zsh...@gmail.com wrote:
The first problem is:
private Integer key
(RunJar.java:156)
Thanks and Regards,
Sonal
On Thu, Feb 4, 2010 at 12:12 AM, Zheng Shao zsh...@gmail.com wrote:
Can you post the Hive query? What are the types of the parameters that
you passed to the function?
Zheng
On Wed, Feb 3, 2010 at 3:23 AM, Sonal Goyal sonalgoy...@gmail.com
, Zheng Shao zsh...@gmail.com wrote:
Hi Sonal,
1. We usually move the group_by column out of the UDAF - just like we
do SELECT key, sum(value) FROM table.
I think you should write:
SELECT customer_id, topx(2, product_id, product_count)
FROM products_bought
and in topx:
public boolean
I would first check whether it is really the block compression or
record compression.
Also maybe the block size is too small but I am not sure that is
tunable in SequenceFile or not.
Zheng
On Sun, Jan 31, 2010 at 9:03 PM, Saurabh Nanda saurabhna...@gmail.com wrote:
Hi,
The size of my Gzipped
The easiest way to go is to write a UDAF to return the answer in
arraystructdecile:int, value:double.
Then you can do: (note that explode is a predefined UDTF)
SELECT
tmp.key, tmp2.d.decile, tmp2.d.value
FROM
(SELECT key, Decile(value) as deciles
GROUP BY key) tmp
LATERAL VIEW
Can you take a look at /tmp/user/hive.log?
There should be some exceptions there.
Zheng
On Wed, Jan 27, 2010 at 7:59 PM, Fu Ecy fuzhijie1...@gmail.com wrote:
I want to load some files on HDFS to a hive table, but there is an execption
as follow:
hive load data inpath
When Hive loads data from HDFS, it moves the files instead of copying the files.
That means the current user should have write permissions to the
source files/directories as well.
Can you check that?
Zheng
On Wed, Jan 27, 2010 at 11:18 PM, Fu Ecy fuzhijie1...@gmail.com wrote:
property
org.apache.hadoop.hive.ql.exec.MoveTask
It doesn't wok.
2010/1/28 Fu Ecy fuzhijie1...@gmail.com
I think this is the problem, I don't have the write permissions to the
source files/directories. Thank you, Shao :-)
2010/1/28 Zheng Shao zsh...@gmail.com
When Hive loads data from HDFS, it moves the files instead of copying
Can you post the traces in /tmp/user/hive.log?
Zheng
On Tue, Jan 26, 2010 at 12:40 AM, Jeff Zhang zjf...@gmail.com wrote:
Hi all,
I follow the get started wiki page, but I use the hive 0.4.1 release version
rather than svn trunk. And when I invoke command: show tables;
It shows the
) at
javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1159)
... 29 more
On Tue, Jan 26, 2010 at 4:52 PM, Zheng Shao zsh...@gmail.com wrote:
Can you post the traces in /tmp/user/hive.log?
Zheng
On Tue, Jan 26, 2010 at 12:40 AM, Jeff Zhang zjf...@gmail.com wrote:
Hi all,
I follow the get
to resolve dependencies:
resolve failed - see output for details
On Tue, Jan 26, 2010 at 6:04 PM, Zheng Shao zsh...@gmail.com wrote:
This usually happens when there is a problem in the metastore
configuration.
Did you change any hive configurations?
Zheng
On Tue, Jan 26, 2010 at 1:41
We can use a combination of UDAF and LATERAL VIEW to implement what you want.
1. Define a UDAF like this: max_n(5, products_bought, customer_id)
which returns the top 5 products_bought and their customer_id in type
of arraystructcol0:int,col1:int
2. Use the Lateral views (with explode) to
Hi Ankit,
org.apache.hadoop.mapreduce.lib.input.XmlInputFormat is implementing
the new mapreduce InputFormat API. while Hive need an InputFormat that
implements org.apache.hadoop.mapred.InputFormat (the old API).
This might work:
If you want the files to stay there, you can try CREATE EXTERNAL
TABLE with a location (instead of create table + load)
Zheng
On Fri, Jan 22, 2010 at 10:51 AM, Bill Graham billgra...@gmail.com wrote:
Hive doesn't delete the files upon load, it moves them to a location under
the Hive warehouse
delimiter with any format.
INSERT OVERWRITE LOCAL DIRECTORY '/mnt/daily_timelines'
[ ROW FORMAT DELIMITED | SERDE ... ]
[ FILE FORMAT ...]
SELECT * FROM daily_timelines;
Is somebody still working on this feature?
On Tue, Jan 12, 2010 at 2:28 PM, Zheng Shao zsh...@gmail.com wrote:
Yes we
hi,
Single insert can extract data into '/tmp/out/1'.I even can see xxx rows
loaded to '/tmp/out/0', xxx rows loaded to '/tmp/out/1'...etc in multi
inserts, but there is no data in fact.
Havn't try svn revision, will try it today.thx.
2010/1/5 Zheng Shao zsh...@gmail.com
Looks like a bug
:
Thanks Zheng.
It does works.
I have a another question,if the field delimiter is a string ,e.g.
,it looks like the LazySimpleSerDe can't works.Does the
LazySimpleSerDe didn't support string field delimiter,only one byte of
control characters?
On Tue, Jan 12, 2010 at 3:05 AM, Zheng Shao zsh
Unfortunately the trunk does not run tests in parallel yet.
The majority of the time is spent in TestCliDriver which contains over
200 .q files.
We will need to separate the working directories and metastore
directories to make these .q files run in parallel.
Zheng
On Thu, Jan 7, 2010 at 11:46
Looks like a bug.
What is the svn revision of Hive?
Did you verify that single insert into '/tmp/out/1' produces non-empty files?
Zheng
On Tue, Jan 5, 2010 at 12:51 AM, wd w...@wdicc.com wrote:
In hive wiki:
Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE [LOCAL]
Hi Saurabh,
I think we can do it with the following 3 UDFs.
make_map(trim(split(cookies, ,)), =)
ArrayListString split(String) See
http://issues.apache.org/jira/browse/HIVE-642
ArrayListString trim(ArrayListString) Open one for that
HashMapString,String make_map(ArrayListString, String
Hi Eric,
Most probably there are leading/trailing spaces in the columns that
are defined as int.
If Hive cannot parse the field successfully, the field will become null.
You can try this to find out the rows:
SELECT * FROM raw_facts WHERE year IS NULL;
Zheng
On Mon, Jan 4, 2010 at 4:10 PM, Eric
1 - 100 of 239 matches
Mail list logo