Re: Implement in clause with or clause

2010-08-04 Thread Zheng Shao
There are no risks, but it will be slower especially when the list after in is very long. Zheng 2010/8/3 我很快乐 896923...@qq.com: Thank you for your reply. Because my company reuire we use 0.4.1 version, so I could't upgrade the version.  Could you tell me there are which risks if I use the OR

Re: Hive support for latin1

2010-08-02 Thread Zheng Shao
Just change FetchTask.java: public boolean fetch(ArrayListString res) res.add(((Text) mSerde.serialize(io.o, io.oi)).toString()); Instead of using Text.toString(), use your own method to convert from raw bytes to unicode String. Zheng On Sun, Aug 1, 2010 at 8:31 PM, bc Wong

Re: built-in UTF8 checker

2010-07-21 Thread Zheng Shao
No, but it's very simple to write one. public class MyUTF8StringChecker extends UDF { public boolean evaluate(Text t) { try { Text.validateUTF8(t.getBytes(), 0, t.getLength()); return true; } catch (MalformedInputException e) { return false; } } } On Tue,

Re: Hive and protocol buffers -- are there UDFs for dealing with them?

2010-07-12 Thread Zheng Shao
If you just need to scan the data once, it makes sense to use hive SerDe to read the data directly (which saves you one I/O round trip). If you need to read the data multiple times, then it's better to save the 3 columns into separate files. Zheng On Mon, Jul 12, 2010 at 5:08 PM, Leo Alekseyev

Re: UDF which takes entire row as arg

2010-07-07 Thread Zheng Shao
Yes. Even a normal (non-generic) UDF might work if all columns can be converted to the same type. UDF can accept variable-length of arguments of the same type. it will be a great addition to let UDF/UDAF handle * (as well as `regex`). The change is all compile-time, and is relatively simple.

Re: Create Table with Line Terminated other than '\n'

2010-06-12 Thread Zheng Shao
here, no? https://issues.apache.org/jira/browse/HIVE-302 Then what did you fix? -- amr On 6/10/2010 10:22 PM, Zheng Shao wrote: Also, changing LINES TERMINATED BY probably won't work, because hadoop's TextInputFormat does not allow line terminators other than \n. Zheng On Thu, Jun 10, 2010 at 6

Re: Create Table with Line Terminated other than '\n'

2010-06-10 Thread Zheng Shao
Also, changing LINES TERMINATED BY probably won't work, because hadoop's TextInputFormat does not allow line terminators other than \n. Zheng On Thu, Jun 10, 2010 at 6:31 PM, Carl Steinbach c...@cloudera.com wrote: Hi Shuja, The grammar for Hive's CREATE TABLE statement is discussed here: 

Re: BUG at optimizer or map side aggregate?

2010-05-12 Thread Zheng Shao
, if I change the alias of subquery 't1' (either the inner one or the join result), the bug disappears. I'm wondering if there is possible that table aliases of different level will conflict when their alias names are the same. 2010/5/12 Zheng Shao zsh...@gmail.com Yes that does seem

Re: why hive ignore my setting about reduce task number?

2010-05-12 Thread Zheng Shao
Do you need to get all records in the order? In most of our use cases users are only interested in the top 100 or something. If you do limit 100 together with order by, it will be much faster. Sent from my iPhone On May 12, 2010, at 1:54 PM, luocan19826...@sohu.com wrote: Thanks, Ted. If

Re: error: Both Left and Right Aliases Encountered in Join obj

2010-04-30 Thread Zheng Shao
Put t1.objt2.obj in the where clause. On Fri, Apr 30, 2010 at 12:14 AM, Harshit Kumar ku...@bike.snu.ac.kr wrote: Hi I have a query like this from spo t1 join spo t2 on (t1.sub=t2.sub and t1.objt2.obj) insert overwrite table spojoin select t1.sub, t1.pre, t2.obj, t2.sub, t2.pre, t2.obj;

Re: HADOOP-4012 and bzip2 input splitting

2010-04-22 Thread Zheng Shao
. the pig script makes 3 mapper for M/R job. What should I check further? Job config info? - Youngwoo 2010/4/22 Zheng Shao zsh...@gmail.com It should be automatically supported. You don't need to do anything except adding the bzip2 codec in io.compression.codecs in hadoop configuration

Re: HADOOP-4012 and bzip2 input splitting

2010-04-21 Thread Zheng Shao
It should be automatically supported. You don't need to do anything except adding the bzip2 codec in io.compression.codecs in hadoop configuration files (core-site.xml) Zheng On Wed, Apr 21, 2010 at 10:15 PM, 김영우 warwit...@gmail.com wrote: Hi, HADOOP-4012,

Re: Cluster By Algorithm?

2010-04-11 Thread Zheng Shao
Its as simple as taking a hashcode of the key and mod by number of reducers. To get started, have a try of any .q files in clientpositive directory. On the code side, HiveKey.java has the implementation. Sent from my iPhone On Apr 11, 2010, at 2:48 PM, Aaron McCurry amccu...@gmail.com

Re: Using newest hive release (0.5.0) - Problem with count(1)

2010-04-06 Thread Zheng Shao
6, 2010 at 3:12 PM, Zheng Shao zsh...@gmail.com wrote: Are you using Java 1.5? Hive now requires Java 1.6 On Tue, Apr 6, 2010 at 7:23 AM, Aaron McCurry amccu...@gmail.com wrote: In the past I have used hive 0.3.0 successfully and now with a new project coming up I decided to give hive

Re: Truncation error when creating table with column containing struct with many fields

2010-04-06 Thread Zheng Shao
That change should be fine. Zheng On Tue, Apr 6, 2010 at 5:16 PM, Dilip Joseph dilip.antony.jos...@gmail.com wrote: Hello, I got the following error when creating a table with a column that has an ARRAY of STRUCTS with many fields.  It appears that there is a 128 character limit on the

Re: create table exception

2010-04-05 Thread Zheng Shao
See http://wiki.apache.org/hadoop/Hive/AdminManual/MetastoreAdmin for details. Zheng On Mon, Apr 5, 2010 at 12:01 AM, Sagar Naik sn...@attributor.com wrote: Hi As a trial, I  am trying to setup hive for local DFS,MR mode I have set property  namehive.metastore.uris/name  

Re: UDAF on AWS Hive

2010-04-02 Thread Zheng Shao
Hive 0.4 has limited support on complex types in UDAF. If you are looking for an ad-hoc solution, try putting the data into a single Text. It will be great if you can ask AWS guys upgrading Hive to 0.5. 0.5 has over 100 bug fixes and is much more stable. Zheng On Fri, Apr 2, 2010 at 1:11 PM,

Re: Sequence Files with data inside key

2010-04-02 Thread Zheng Shao
The easiest way is to write a SequenceFileInputFormat that returns a RecordReader that has key in the value and value in the key. Zheng On Fri, Apr 2, 2010 at 2:16 PM, Edward Capriolo edlinuxg...@gmail.com wrote: I have some sequence files in which all our data is in the key.

Re: date_sub() function returns wrong date because of daylight saving time difference

2010-04-01 Thread Zheng Shao
I will take a look. Thanks Bryan! On Thu, Apr 1, 2010 at 12:38 AM, Bryan Talbot btal...@aeriagames.com wrote: I guess most places are running their clusters with UTC time zones or these functions are not widely used. Any chance of getting a committer to look at the patch with unit tests?

Re: unix_timestamp function

2010-04-01 Thread Zheng Shao
Setting TZ in your .bash_profile won't work because the map/reduce tasks runs on the hadoop clusters. If you start your hadoop tasktracker with that TZ setting, it will probably work. Zheng On Thu, Apr 1, 2010 at 3:32 PM, tom kersnick hiveu...@gmail.com wrote: So its working, but Im having a

Re: How do I make Hive use a custom scheduler and not the default scheduler?

2010-03-23 Thread Zheng Shao
Hive also loads hadoop conf in HADOOP_HOME/conf. You can set it there. On 3/23/10, Ryan LeCompte lecom...@gmail.com wrote: Right now when we submit queries, it uses the hadoop scheduler. I have a custom fair share scheduler configured as well, but I see that jobs generated from our Hive

Re: Performance Programming Comparison of JAQL, Hive, Pig and Java

2010-03-23 Thread Zheng Shao
Glad to know that Hive has a good performance compared with other languages. It will be great if you can publish the queries/codes in the benchmark, as well as environment setup, so that other people can rerun your benchmark easily. Zheng On Tue, Mar 23, 2010 at 7:11 AM, Rob Stewart

Re: support for arrays, maps, structs while writing output of custom reduce script to table

2010-03-22 Thread Zheng Shao
From 0.5 (probably), we can add type information to the column names after AS. Note that the first level separator should be TAB, and the second separator should be ^B (and then ^C, etc) FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s    INSERT OVERWRITE TABLE SS    REDUCE *      

Re: support for arrays, maps, structs while writing output of custom reduce script to table

2010-03-22 Thread Zheng Shao
to userid. Dilip On Mon, Mar 22, 2010 at 2:20 PM, Zheng Shao zsh...@gmail.com wrote: From 0.5 (probably), we can add type information to the column names after AS. Note that the first level separator should be TAB, and the second separator should be ^B (and then ^C, etc) FROM (select * from

Re: SerDe examples that use arrays and structs?

2010-03-21 Thread Zheng Shao
BinarySortableSerDe, LazySimpleSerDe, and LazyBinarySerDe all supports arrays/structs. There is a UDF called size(var) that can return the size of an array. Zheng On Sun, Mar 21, 2010 at 9:19 PM, Adam O'Donnell a...@immunet.com wrote: First of all, thank you to all of the facebook guys for

Re: delimiters for nested structures

2010-03-19 Thread Zheng Shao
Multiple-level of delimiters works as the following by default: The first level (fields delimiters) will be \001 (^A, ascii code 1). Each level of struct and array take an additional field delimitor following (\002, etc). Each level of map takes 2 levels of additional field deimitor. So it will

Re: DynamicSerDe/TBinaryProtocol

2010-03-10 Thread Zheng Shao
What is the format of your data? TBinaryProtocol does not work with TextFile format, as you can imagine. On 3/10/10, Anty anty@gmail.com wrote: Hi: ALL I encounter a problem, any suggestion will be appreciated! MY hive version is 0.30.0 I create a table in CLI. CREATE TABLE table2

Re: Hive UDF Unknown exception:

2010-03-10 Thread Zheng Shao
Try Double[]. Primitive arrays (like double[], int[]) are not supported yet, because that needs special handling for each of the primitive type. Zheng On Wed, Mar 10, 2010 at 4:55 PM, tom kersnick hiveu...@gmail.com wrote: Gents, Any ideas why this happens? Im using hive 0.50 with hadoop

Re: problem with IS NOT NULL operator in hive

2010-03-09 Thread Zheng Shao
WHERE product_name IS NOT NULL AND product_name '' On Tue, Mar 9, 2010 at 12:45 AM, prakash sejwani prakashsejw...@gmail.com wrote: yes right can you give me a tip how to exclude blank values On Tue, Mar 9, 2010 at 2:13 PM, Zheng Shao zsh...@gmail.com wrote: So I guess you didn't exclude

Re: All Map jobs fail with NPE in LazyStruct.uncheckedGetField

2010-03-05 Thread Zheng Shao
Do you want to try hive release 0.5.0 or hive trunk? We should have provided better error messages here: https://issues.apache.org/jira/browse/HIVE-1216 Zheng On Thu, Mar 4, 2010 at 12:34 PM, Tom Nichols tmnich...@gmail.com wrote: I am trying out Hive, using Cloudera's EC2 distribution (Hadoop

Re: complex query using FROM and INSERT in hive

2010-03-02 Thread Zheng Shao
there is an extra , before FROM cast(regexp_extract(resource, '/companies/(\\d+)', 1) AS INT) AS company_id, -- Run our User Defined Function (see src/com/econify/geoip/IpToCountry.java). Takes the IP of the hit and looks up its country -- ip_to_country(ip) AS ip_country

Re: Hive User Group Meeting 3/18/2010 7pm at Facebook

2010-03-01 Thread Zheng Shao
. If you'd like to network with fellow Hive/Hadoop users online, feel free to find them here: http://www.facebook.com/event.php?eid=319237846974 Zheng On Fri, Feb 26, 2010 at 1:56 PM, Zheng Shao zsh...@gmail.com wrote: Hi all, We are going to hold the second Hive User Group Meeting at 7PM on 3/18

Re: hive 0.50 on hadoop 0.22

2010-03-01 Thread Zheng Shao
/UserGroupInformation.html shows such a constructor. Now, my question is: is this something that can be fixed by shims? Or it is a problem with hadoop? -Original Message- From: Zheng Shao [mailto:zsh...@gmail.com] Sent: Saturday, February 27, 2010 4:24 AM To: hive-user@hadoop.apache.org

Re: hive 0.50 on hadoop 0.22

2010-02-27 Thread Zheng Shao
Hi Mazar, We have not tried Hive on Hadoop higher than 0.20 yet. However, Hive has the shim infrastructure which makes it easy to port to new Hadoop versions. Please see the shim directory inside Hive. Zheng On Fri, Feb 26, 2010 at 1:59 PM, Massoud Mazar massoud.ma...@avg.com wrote: Is it

Hive User Group Meeting 3/18/2010 7pm at Facebook

2010-02-26 Thread Zheng Shao
Hi all, We are going to hold the second Hive User Group Meeting at 7PM on 3/18/2010 Thursday. The agenda will be: * Hive Tutorial: 20 min * Hive User Case Study: 20 min * New Features and API: 25 min JDBC/ODBC and CTAS UDF/UDAF/UDTF Create View/HBaseInputFormat Hive Join Strategy SerDe

Re: How to generate Row Id in Hive?

2010-02-25 Thread Zheng Shao
Since Hive runs many mappers/reducers in parallel, there is no way to generate a globally unique increasing row id. If you are OK with that, you can easily write a non-deterministic UDF. See rand() (or UDFRand.java) for example. Please open a JIRA if you plan to work on that. Zheng On Wed, Feb

Re: Execution Error

2010-02-25 Thread Zheng Shao
Most probably $TMPDIR does not exist. I think by default it's /tmp/user. Can you mkdir ? On Thu, Feb 25, 2010 at 5:58 AM, Aryeh Berkowitz ar...@iswcorp.com wrote:     Can anybody tell me why I’m getting this error? hive show tables; OK email html_href html_src ipadrr

Re: How to generate Row Id in Hive?

2010-02-25 Thread Zheng Shao
could use mapred.task.id to get a unique string. -Todd On Thu, Feb 25, 2010 at 12:42 AM, Zheng Shao zsh...@gmail.com wrote: Since Hive runs many mappers/reducers in parallel, there is no way to generate a globally unique increasing row id. If you are OK with that, you can easily write a non

[ANNOUNCE] Hive 0.5.0 released

2010-02-24 Thread Zheng Shao
Hi folks, We have released Hive 0.5.0. You can find it from the download page in 24 hours (still waiting to be mirrored) http://hadoop.apache.org/hive/releases.html#Download -- Yours, Zheng

Re: [ANNOUNCE] Hive 0.5.0 released

2010-02-24 Thread Zheng Shao
? -Original Message- From: Zheng Shao [mailto:zsh...@gmail.com] Sent: Wednesday, February 24, 2010 3:34 AM To: hive-user@hadoop.apache.org; hive-...@hadoop.apache.org Subject: [ANNOUNCE] Hive 0.5.0 released Hi folks, We have released Hive 0.5.0. You can find it from the download

Re: [ANNOUNCE] Hive 0.5.0 released

2010-02-24 Thread Zheng Shao
, Ryan LeCompte lecom...@gmail.com wrote: Ah, interesting. Using Hadoop 0.20.1. Is this the problematic version? Thanks, Ryan On Wed, Feb 24, 2010 at 12:50 PM, Zheng Shao zsh...@gmail.com wrote: Thanks for the feedback. Which exact version of hadoop are you using? There is a bug

Re: Error while starting hive

2010-02-21 Thread Zheng Shao
export HADOOP_CLASSPATH=/master/hadoop/json.jar:/master/hadoop/hbase-0.20.2/hbase-0.20.2.jar:/master/hadoop/hbase-0.20.2/lib/zookeeper-3.2.1.jar:/master/hadoop/hive/build/dist/lib/:/master/hadoop/hive/build/dist/lib/*.jar:/master/hadoop/hive/build/dist/conf/ should be: export

[VOTE] hive 0.5.0 release (rc1)

2010-02-21 Thread Zheng Shao
Hi, I just made a release candidate at https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.5.0-rc1 The tarballs are at: http://people.apache.org/~zshao/hive-0.5.0-candidate-1/ The HWI startup problem is fixed in rc1. This supersedes the previous email about voting on rc0. Please vote.

Re: [VOTE] hive 0.5.0 release candidate 0

2010-02-20 Thread Zheng Shao
Can you generate a patch for 0.5? The patch does not work on branch-0.5 Zheng On 2/19/10, Edward Capriolo edlinuxg...@gmail.com wrote: On Fri, Feb 19, 2010 at 9:49 PM, Zheng Shao zsh...@gmail.com wrote: Hi, I just made a release candidate at https://svn.apache.org/repos/asf/hadoop/hive

Re: SequenceFile compression on Amazon EMR not very good

2010-02-19 Thread Zheng Shao
is compressed, which ones do I have to set? Saurabh. On Fri, Feb 19, 2010 at 12:37 AM, Zheng Shao zsh...@gmail.com wrote: Did you also: SET mapred.output.compression.codec=org.apacheGZipCode; Zheng On Thu, Feb 18, 2010 at 8:25 AM, Saurabh Nanda saurabhna...@gmail.com wrote: Hi

Re: Thrift Server Error Messages

2010-02-19 Thread Zheng Shao
Can you open a JIRA and help propose some concrete design of the change? That will help make it faster to have this feature. Thanks, Zheng On Fri, Feb 19, 2010 at 6:17 AM, Andy Kent andy.k...@forward.co.uk wrote: When executing commands on the hive command line it give really useful output if

Re: computing median and percentiles

2010-02-19 Thread Zheng Shao
Hi Jerome, Is there any update on this? https://issues.apache.org/jira/browse/HIVE-259 Zheng On Fri, Feb 5, 2010 at 9:34 AM, Jerome Boulon jbou...@netflix.com wrote: Hi Bryan, I'm working on Hive-259. I'll post an update early next week. /Jerome. On 2/4/10 9:08 PM, Bryan Talbot

Re: Having trouble with lateral view

2010-02-19 Thread Zheng Shao
Jason, Do you want to open a JIRA and contrib your map_explode function to Hive? That will be greatly appreciated. Zheng On Fri, Feb 19, 2010 at 2:49 PM, Yongqiang He heyongqi...@software.ict.ac.cn wrote: Hi Jason, This is a known bug, see https://issues.apache.org/jira/browse/HIVE-1056

[VOTE] hive 0.5.0 release candidate 0

2010-02-19 Thread Zheng Shao
Hi, I just made a release candidate at https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.5.0-rc0 The tarballs are at: http://people.apache.org/~zshao/hive-0.4.1-candidate-3/ Please vote. -- Yours, Zheng

Re: SequenceFile compression on Amazon EMR not very good

2010-02-18 Thread Zheng Shao
. Will do some more tests and get back. Saurabh. On Mon, Feb 1, 2010 at 1:22 PM, Zheng Shao zsh...@gmail.com wrote: I would first check whether it is really the block compression or record compression. Also maybe the block size is too small but I am not sure that is tunable in SequenceFile

Re: Question on modifying a table to become external

2010-02-18 Thread Zheng Shao
There is no command to do that right now. One way to go is to create another external table pointing to the same location (and forget about the old table). Or you can move the files first, before dropping and recreating the same table. Zheng On Thu, Feb 18, 2010 at 10:22 AM, Eva Tse

Re: map join and OOM

2010-02-18 Thread Zheng Shao
https://issues.apache.org/jira/browse/HIVE-917 might be what you want (suppose both of the tables are already bucketed on the join column). Zheng On Thu, Feb 18, 2010 at 2:53 PM, Ning Zhang nzh...@facebook.com wrote: 1GB of the small table is usually too large for map-side joins. If the raw

Re: Hive Server Leaking File Descriptors?

2010-02-18 Thread Zheng Shao
HIVE-1181 for branch 0.5. Zheng -- Forwarded message -- From: Andy Kent andy.k...@forward.co.uk Date: Thu, Feb 18, 2010 at 3:17 PM Subject: Re: Hive Server Leaking File Descriptors? To: hive-user@hadoop.apache.org hive-user@hadoop.apache.org On 18 Feb 2010, at 20:29, Zheng Shao

Re: NoClassDef error

2010-02-18 Thread Zheng Shao
The stacktrace that you showed is from the hive cli right? Did you define HADOOP_CLASSPATH somewhere? Hive modifies HADOOP_CLASSPATH so it's important to modify it by export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/my/new/path instead of directly overwriting it. Zheng On Thu, Feb 18, 2010 at 9:22

Re: NoClassDef error

2010-02-17 Thread Zheng Shao
In which directory did you start hive? hive should be started in build/dist Zheng On Wed, Feb 17, 2010 at 2:23 AM, Vidyasagar Venkata Nallapati vidyasagar.nallap...@onmobile.com wrote: Hi , When starting the hive I am getting an error even after I am including in class path, attached is

Re: Help with Compressed Storage

2010-02-17 Thread Zheng Shao
I just corrected the wiki page. It will also be a good idea to support case-insensitive boolean values in the code. Zheng On Wed, Feb 17, 2010 at 9:27 AM, Brent Miller brentalanmil...@gmail.com wrote: Thanks Adam, that works for me as well. It seems that the property for 

Re: hive ant spead ups

2010-02-17 Thread Zheng Shao
I think this is worth exploring. Unit test time is now longer and longer given more code and more tests. Do you want to start a JIRA issue and discuss more about it? Zheng On Wed, Feb 17, 2010 at 8:53 AM, Edward Capriolo edlinuxg...@gmail.com wrote: I made an ant target quick-test, which

Re: Help with Compressed Storage

2010-02-17 Thread Zheng Shao
( like Pig  ) hive automatically detects .bz2 extensions and applies appropriate decompression. Am I wrong ? -Prasen On Thu, Feb 18, 2010 at 3:04 AM, Zheng Shao zsh...@gmail.com wrote: I just corrected the wiki page. It will also be a good idea to support case-insensitive boolean values

Re: Help with Compressed Storage

2010-02-17 Thread Zheng Shao
and appreciate, -Prasen On Thu, Feb 18, 2010 at 11:52 AM, Zheng Shao zsh...@gmail.com wrote: There is no special setting for bz2. Can you get the debug log? Zheng On Wed, Feb 17, 2010 at 9:02 PM, prasenjit mukherjee pmukher...@quattrowireless.com wrote: So I tried the same with  .gz

Re: Help with Compressed Storage

2010-02-17 Thread Zheng Shao
in their hadoop-site.xml.  Is there a way I can pass those parameters from hive, so that I dont need to manually change the file  ? -Thanks, Prasen On Thu, Feb 18, 2010 at 12:54 PM, Zheng Shao zsh...@gmail.com wrote: Just remember that we need to have the BZipCodec class in the following hadoop

[VOTE] release hive 0.5.0

2010-02-15 Thread Zheng Shao
Hive branch 0.5 was created 5 weeks ago: https://svn.apache.org/viewvc/hadoop/hive/branches/branch-0.5/ It has also been running as the production version of Hive at Facebook for 2 weeks. We'd like to start making release candidates (for 0.5.0) from branch 0.5. Please vote. -- Yours, Zheng

Re: Hive Server Leaking File Descriptors?

2010-02-15 Thread Zheng Shao
Can you go to that box, sudo as root, and do lsof | grep 12345 where 12345 is the process id of the hive server? We should be able to see the names of the files that are open. Zheng On Mon, Feb 15, 2010 at 7:42 AM, Andy Kent andy.k...@forward.co.uk wrote: Nope, no luck so far. We have upped

Re: Got sun.misc.InvalidJarIndexException: Invalid index

2010-02-15 Thread Zheng Shao
MySQL is recommended for multiple-node deployment of Hive. Can you try MySQL? Zheng On Mon, Feb 8, 2010 at 6:32 PM, Mafish Liu maf...@gmail.com wrote: Hi, all: I'm deploying hive from node A to node B. Hive on node A works properly while on node B, when I try to create a new table, I got the

Re: SerDe issue

2010-02-12 Thread Zheng Shao
Hi Roberto, The reason that Text is passed in is because the table is defined as TextFile format (the default). There are some examples (*.q files) of using SequenceFile format ( CREATE TABLE xxx STORED AS SEQUENCEFILE). SEQUENCEFILE will return BytesWritable by default. Please have a try.

Re: Hive Installation Error

2010-02-11 Thread Zheng Shao
What commands did you run? With which release? Zheng On Wed, Feb 10, 2010 at 11:20 PM, Vidyasagar Venkata Nallapati vidyasagar.nallap...@onmobile.com wrote: Hi, Installation is giving an error as master/hadoop/hadoop-0.20.1/build.xml:895: 'java5.home' is not defined. Forrest requires

Re: Distributing additional files for reduce scripts

2010-02-11 Thread Zheng Shao
add file myfile.txt; You can find some examples in *.q files in the distribution. Zheng On Thu, Feb 11, 2010 at 10:23 PM, Adam O'Donnell a...@immunet.com wrote: Guys: How do you go about distributing additional files that may be needed by your reduce scripts?  For example, I need to

Re: hive map reduce output

2010-02-09 Thread Zheng Shao
Another possible reason is that we found sometimes hadoop framework does not return the correct count to the clients. In all these cases, the count is smaller than the number of rows actually loaded. which version of hadoop are you using? Zheng On Mon, Feb 8, 2010 at 11:27 PM, Jeff Hammerbacher

Re: Lzo problem throwing java.io.IOException:java.io.EOFException

2010-02-09 Thread Zheng Shao
Looks like a lzo codec problem. Can you try a simple mapreduce program outputs to lzo compression and the same output file format as you hive table? On 2/9/10, Bennie Schut bsc...@ebuddy.com wrote: I have a bit of an edge case on using lzo which I think might be related to HIVE-524. When

Re: Using UDFs stored on HDFS

2010-02-08 Thread Zheng Shao
Yes that's correct. I prefer to download the jars in add jar. Zheng On Mon, Feb 8, 2010 at 3:46 PM, Philip Zeyliger phi...@cloudera.com wrote: Hi folks, I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  Can you use a UDF where the jar which contains the function is

Re: LZO Compression on trunk

2010-02-05 Thread Zheng Shao
That seems to be a bug. Are you using hive trunk or any release? On 2/5/10, Bennie Schut bsc...@ebuddy.com wrote: I have a tab separated files I have loaded it with load data inpath then I do a SET hive.exec.compress.output=true; SET

Re: Hive Installation Problem

2010-02-05 Thread Zheng Shao
-Original Message- From: Zheng Shao [mailto:zsh...@gmail.com] Sent: Friday, February 05, 2010 12:47 PM To: hive-user@hadoop.apache.org Subject: Re: Hive Installation Problem Added to http://wiki.apache.org/hadoop/Hive/FAQ Zheng On Thu, Feb 4, 2010 at 11:11 PM, Zheng Shao zsh

Re: Concurrently load data into Hive tables?

2010-02-04 Thread Zheng Shao
We can load data/insert overwrite data concurrently as long as they are different partitions. On Thu, Feb 4, 2010 at 6:51 AM, Ryan LeCompte lecom...@gmail.com wrote: Hey guys, Is it possible to concurrently load data into Hive tables (same table, different partition)? I'd like to concurrently

Re: Question about Hive supporting new Hadoop MapReduce API

2010-02-04 Thread Zheng Shao
We haven't had a plan yet. It will be great to draw out the pros/cons of moving to the new MapReduce API. Do you want to open a JIRA to discuss it? Zheng On Thu, Feb 4, 2010 at 5:46 PM, Schubert Zhang zson...@gmail.com wrote: Does anyone know the plan of Hive to support new Hadoop MapReduce

Re: computing median and percentiles

2010-02-04 Thread Zheng Shao
I would say, just create a histogram of value, count pair, sort at the end, and return the value at the percentile. This assumes that the number of unique values are not big, which can be easily enforced by using round(number, digits). Zheng On Thu, Feb 4, 2010 at 9:08 PM, Bryan Talbot

Re: Hive Installation Problem

2010-02-04 Thread Zheng Shao
Added to http://wiki.apache.org/hadoop/Hive/FAQ Zheng On Thu, Feb 4, 2010 at 11:11 PM, Zheng Shao zsh...@gmail.com wrote: Try this: cd ~/.ant/cache/hadoop/core/sources wget http://archive.apache.org/dist/hadoop/core/hadoop-0.20.1/hadoop-0.20.1.tar.gz Zheng On Thu, Feb 4, 2010 at 10:23

Re: Resolvers for UDAFs

2010-02-03 Thread Zheng Shao
Can you post the Hive query? What are the types of the parameters that you passed to the function? Zheng On Wed, Feb 3, 2010 at 3:23 AM, Sonal Goyal sonalgoy...@gmail.com wrote: Hi, I am writing a UDAF which takes in 4 parameters. I have 2 cases - one where all the paramters are ints, and

Re: Converting multiple joins into a single multi-way join

2010-02-03 Thread Zheng Shao
See ql/src/test/queries/clientpositive/uniquejoin.q FROM UNIQUEJOIN PRESERVE T1 a (a.key), PRESERVE T2 b (b.key), PRESERVE T3 c (c.key) SELECT a.key, b.key, c.key; FROM UNIQUEJOIN T1 a (a.key), T2 b (b.key), T3 c (c.key) SELECT a.key, b.key, c.key; FROM UNIQUEJOIN T1 a (a.key), T2 b (b.key-1),

Re: Converting multiple joins into a single multi-way join

2010-02-03 Thread Zheng Shao
https://issues.apache.org/jira/browse/HIVE-591 On Wed, Feb 3, 2010 at 1:34 PM, Zheng Shao zsh...@gmail.com wrote: See ql/src/test/queries/clientpositive/uniquejoin.q FROM UNIQUEJOIN PRESERVE T1 a (a.key), PRESERVE T2 b (b.key), PRESERVE T3 c (c.key) SELECT a.key, b.key, c.key; FROM

Re: intermediate data written to the disk?

2010-02-03 Thread Zheng Shao
If the join key is the same, you can use unique join to make sure it's done in a single map-reduce job. Zheng On Wed, Feb 3, 2010 at 1:25 AM, bharath v bharathvissapragada1...@gmail.com wrote: Hi , I have a small doubt in how hive handles queries containing join of more than 2 tables .

Re: Help writing UDAF with custom object

2010-02-03 Thread Zheng Shao
the function to output something and to verify that Hive specific hooks are in place. If you have any suggestions, please do let me know. Thanks and Regards, Sonal On Mon, Feb 1, 2010 at 1:19 PM, Zheng Shao zsh...@gmail.com wrote: The first problem is:                private Integer key

Re: Resolvers for UDAFs

2010-02-03 Thread Zheng Shao
(RunJar.java:156) Thanks and Regards, Sonal On Thu, Feb 4, 2010 at 12:12 AM, Zheng Shao zsh...@gmail.com wrote: Can you post the Hive query? What are the types of the parameters that you passed to the function? Zheng On Wed, Feb 3, 2010 at 3:23 AM, Sonal Goyal sonalgoy...@gmail.com

Re: Resolvers for UDAFs

2010-02-03 Thread Zheng Shao
, Zheng Shao zsh...@gmail.com wrote: Hi Sonal, 1. We usually move the group_by column out of the UDAF - just like we do SELECT key, sum(value) FROM table. I think you should write: SELECT customer_id, topx(2, product_id, product_count) FROM products_bought and in topx: public boolean

Re: SequenceFile compression on Amazon EMR not very good

2010-01-31 Thread Zheng Shao
I would first check whether it is really the block compression or record compression. Also maybe the block size is too small but I am not sure that is tunable in SequenceFile or not. Zheng On Sun, Jan 31, 2010 at 9:03 PM, Saurabh Nanda saurabhna...@gmail.com wrote: Hi, The size of my Gzipped

Re: UDAF/UDTF question

2010-01-28 Thread Zheng Shao
The easiest way to go is to write a UDAF to return the answer in arraystructdecile:int, value:double. Then you can do: (note that explode is a predefined UDTF) SELECT tmp.key, tmp2.d.decile, tmp2.d.value FROM (SELECT key, Decile(value) as deciles GROUP BY key) tmp LATERAL VIEW

Re: help!

2010-01-27 Thread Zheng Shao
Can you take a look at /tmp/user/hive.log? There should be some exceptions there. Zheng On Wed, Jan 27, 2010 at 7:59 PM, Fu Ecy fuzhijie1...@gmail.com wrote: I want to load some files on HDFS to a hive table, but there is an execption as follow: hive load data inpath

Re: help!

2010-01-27 Thread Zheng Shao
When Hive loads data from HDFS, it moves the files instead of copying the files. That means the current user should have write permissions to the source files/directories as well. Can you check that? Zheng On Wed, Jan 27, 2010 at 11:18 PM, Fu Ecy fuzhijie1...@gmail.com wrote: property  

Re: help!

2010-01-27 Thread Zheng Shao
org.apache.hadoop.hive.ql.exec.MoveTask It doesn't wok. 2010/1/28 Fu Ecy fuzhijie1...@gmail.com I think this is the problem, I don't have the write permissions to the source files/directories. Thank you, Shao :-) 2010/1/28 Zheng Shao zsh...@gmail.com When Hive loads data from HDFS, it moves the files instead of copying

Re: Can not run hive 0.4.1

2010-01-26 Thread Zheng Shao
Can you post the traces in /tmp/user/hive.log? Zheng On Tue, Jan 26, 2010 at 12:40 AM, Jeff Zhang zjf...@gmail.com wrote: Hi all, I follow the get started wiki page, but I use the hive 0.4.1 release version rather than svn trunk. And when I invoke command: show tables; It shows the

Re: Can not run hive 0.4.1

2010-01-26 Thread Zheng Shao
)    at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1159) ... 29 more On Tue, Jan 26, 2010 at 4:52 PM, Zheng Shao zsh...@gmail.com wrote: Can you post the traces in  /tmp/user/hive.log? Zheng On Tue, Jan 26, 2010 at 12:40 AM, Jeff Zhang zjf...@gmail.com wrote: Hi all, I follow the get

Re: Can not run hive 0.4.1

2010-01-26 Thread Zheng Shao
to resolve dependencies:     resolve failed - see output for details On Tue, Jan 26, 2010 at 6:04 PM, Zheng Shao zsh...@gmail.com wrote: This usually happens when there is a problem in the metastore configuration. Did you change any hive configurations? Zheng On Tue, Jan 26, 2010 at 1:41

Re: How can I implement a cursor in Hive? ...or... Can I implement a CROSS APPLY in Hive?...or... How can I do a FOR or WHILE loop (inside or outside) of Hive?

2010-01-25 Thread Zheng Shao
We can use a combination of UDAF and LATERAL VIEW to implement what you want. 1. Define a UDAF like this: max_n(5, products_bought, customer_id) which returns the top 5 products_bought and their customer_id in type of arraystructcol0:int,col1:int 2. Use the Lateral views (with explode) to

Re: Error after loading data

2010-01-22 Thread Zheng Shao
Hi Ankit, org.apache.hadoop.mapreduce.lib.input.XmlInputFormat is implementing the new mapreduce InputFormat API. while Hive need an InputFormat that implements org.apache.hadoop.mapred.InputFormat (the old API). This might work:

Re: Deleted input files after load

2010-01-22 Thread Zheng Shao
If you want the files to stay there, you can try CREATE EXTERNAL TABLE with a location (instead of create table + load) Zheng On Fri, Jan 22, 2010 at 10:51 AM, Bill Graham billgra...@gmail.com wrote: Hive doesn't delete the files upon load, it moves them to a location under the Hive warehouse

Re: hive multiple inserts

2010-01-13 Thread Zheng Shao
delimiter with any format. INSERT OVERWRITE LOCAL DIRECTORY '/mnt/daily_timelines' [ ROW FORMAT DELIMITED | SERDE ... ] [ FILE FORMAT ...] SELECT * FROM daily_timelines; Is somebody still working on this feature? On Tue, Jan 12, 2010 at 2:28 PM, Zheng Shao zsh...@gmail.com wrote: Yes we

Re: hive multiple inserts

2010-01-11 Thread Zheng Shao
hi, Single insert can extract data into '/tmp/out/1'.I even can see xxx rows loaded to '/tmp/out/0', xxx rows loaded to '/tmp/out/1'...etc in multi inserts, but there is no data in fact. Havn't try svn revision, will try it today.thx. 2010/1/5 Zheng Shao zsh...@gmail.com Looks like a bug

Re: hive multiple inserts

2010-01-11 Thread Zheng Shao
: Thanks Zheng. It does works. I have a another question,if the field delimiter is a string ,e.g. ,it looks like the LazySimpleSerDe can't works.Does the LazySimpleSerDe didn't support string field delimiter,only one byte of control characters? On Tue, Jan 12, 2010 at 3:05 AM, Zheng Shao zsh

Re: Speedup of test target

2010-01-08 Thread Zheng Shao
Unfortunately the trunk does not run tests in parallel yet. The majority of the time is spent in TestCliDriver which contains over 200 .q files. We will need to separate the working directories and metastore directories to make these .q files run in parallel. Zheng On Thu, Jan 7, 2010 at 11:46

Re: hive multiple inserts

2010-01-05 Thread Zheng Shao
Looks like a bug. What is the svn revision of Hive? Did you verify that single insert into '/tmp/out/1' produces non-empty files? Zheng On Tue, Jan 5, 2010 at 12:51 AM, wd w...@wdicc.com wrote: In hive wiki: Hive extension (multiple inserts): FROM from_statement INSERT OVERWRITE [LOCAL]

RE: Populating MAP type columns

2010-01-05 Thread Zheng Shao
Hi Saurabh, I think we can do it with the following 3 UDFs. make_map(trim(split(cookies, ,)), =) ArrayListString split(String) See http://issues.apache.org/jira/browse/HIVE-642 ArrayListString trim(ArrayListString) Open one for that HashMapString,String make_map(ArrayListString, String

Re: Null values in hive output

2010-01-04 Thread Zheng Shao
Hi Eric, Most probably there are leading/trailing spaces in the columns that are defined as int. If Hive cannot parse the field successfully, the field will become null. You can try this to find out the rows: SELECT * FROM raw_facts WHERE year IS NULL; Zheng On Mon, Jan 4, 2010 at 4:10 PM, Eric

  1   2   3   >