Re: Null values in hive output

2010-01-04 Thread Zheng Shao
wrote: Hi Zheng, Is there any way to convince the LazySimpleSerde to allow leading/trailing spaces in non-text fields? -Todd On Mon, Jan 4, 2010 at 7:33 PM, Zheng Shao zsh...@gmail.com wrote: Hi Eric, Most probably there are leading/trailing spaces in the columns that are defined

Re: Null values in hive output

2010-01-04 Thread Zheng Shao
REPLACE COLUMNS. Zheng On Mon, Jan 4, 2010 at 10:28 PM, Eric Sammer e...@lifeless.net wrote: On 1/4/10 10:33 PM, Zheng Shao wrote: Hi Eric, Most probably there are leading/trailing spaces in the columns that are defined as int. If Hive cannot parse the field successfully, the field

Re: 答复: chinese word display wrong in hive conso le,can somebody help me?

2009-12-27 Thread Zheng Shao
How was the Chinese words encoded in the file? Is it UTF-8 or GB? If it's GB, then Hive will have difficulty in converting them to unicode. Please take a look at Driver.java. There is a method to get the results as ListString. If we get the result as Text (byte array) instead, you can get the

Re: javax.jdo.JDODataStoreException

2009-12-24 Thread Zheng Shao
Can you open /tmp/user/hive.log? It should have the full stack trace. Zheng On Wed, Dec 23, 2009 at 12:17 PM, Nathan Rasch nathan.ra...@returnpath.net wrote: All: I've been setting up Hive using Derby in Server Mode as per the instructions here:  

Re: Show error while accessing data using hive

2009-12-21 Thread Zheng Shao
Mohan, Please take a look at /tmp/username/hive.log. It contains the full stack trace of the problem. It seems like a configuration problem. Zheng On Sun, Dec 20, 2009 at 9:04 PM, Mohan Agarwal mohan.agarwa...@gmail.com wrote: Hi, I have installed hadoop-0.19.2  on my system in a

Re: How do I INSERT OVERWRITE into a new table if it's partitioned?

2009-12-21 Thread Zheng Shao
You are correct. Just opened https://issues.apache.org/jira/browse/HIVE-1002 This is a highly wanted feature from a lot of users. Please comment on the JIRA. Let's figure out how we want to do it. Zheng On Wed, Dec 16, 2009 at 6:36 PM, ken.barc...@wellsfargo.com wrote: Correct me if I’m

Re: CombinedHiveInputFormat combining across tables

2009-12-20 Thread Zheng Shao
Sorry about the delay. Are you using Hive trunk? Filed https://issues.apache.org/jira/browse/HIVE-1001 We should use (new Path(str)).getPath() instead of chopping off the first 5 chars. Zheng On Mon, Dec 14, 2009 at 4:43 PM, David Lerman dler...@videoegg.com wrote: I'm running into errors

Re: Throttling hive queries

2009-12-19 Thread Zheng Shao
We plan to run a vote and branch 0.5 around early Jan. However we do run trunk for some adhoc queries (note that trunk and branch 0.4 can share the metastore and data on hdfs) and branch 0.4 for production queries. Hive trunk does support combine file input format but the fix to hadoop 0.20 was

RE: Hive not using the full mapper capacity for certain jobs

2009-12-10 Thread Zheng Shao
Try this: set mapred.map.tasks=28; Zheng From: Ryan LeCompte [mailto:lecom...@gmail.com] Sent: Thursday, December 10, 2009 1:45 PM To: hive-user@hadoop.apache.org Subject: Hive not using the full mapper capacity for certain jobs Hello all, The cluster has a capacity of 28 concurrent mappers. It

[VOTE] hive release candidate 0.4.1-rc3

2009-12-01 Thread Zheng Shao
are the list of changes: HIVE-884. Metastore Server should call System.exit() on error. (Zheng Shao via pchakka) HIVE-864. Fix map-join memory-leak. (Namit Jain via zshao) HIVE-878. Update the hash table entry before flushing in Group By hash aggregation (Zheng Shao via namit) HIVE-882

Re: FW: Table created using RegexSerDe doesn't work when made external

2009-11-30 Thread Zheng Shao
Yes Ken. Please try http://www.fileformat.info/tool/regex.htm to test your regex to see if it can match your data or not. Zheng On Mon, Nov 30, 2009 at 6:11 PM, ken.barc...@wellsfargo.com wrote: I looked in hive.log while doing the CREATE TABLE and found: 2009-11-30 17:51:53,240 WARN 

Re: Building Hive - cannot resolve dependencies

2009-11-12 Thread Zheng Shao
: Hey Zheng, What do we need to do to fix this? It seems to have bitten a number of people by now. -Todd On Tue, Nov 10, 2009 at 3:50 PM, Zheng Shao zsh...@gmail.com wrote: I am forwarding an earlier email from the same mailing list by search for Downloaded file size doesn't match expected

Re: How to use comstom mapreduce script

2009-11-12 Thread Zheng Shao
Yes you can compile it into a jar, and insert the command line java /xxx/my.jar into Hive queries. http://www.slideshare.net/ragho/hive-user-meeting-august-2009-facebook Page 72 has an example IF your map/reduce function is simple, you can probably write a Hive UDF instead. In the near future,

Hive PoweredBy wiki page

2009-11-12 Thread Zheng Shao
Hi Hive users, Would you please add your company's name and a little description of how you used Hive on the following wiki page? This helps new users get more ideas about how Hive can be used and where it is used now. http://wiki.apache.org/hadoop/Hive/PoweredBy -- Yours, Zheng

Re: Building Hive - cannot resolve dependencies

2009-11-10 Thread Zheng Shao
I am forwarding an earlier email from the same mailing list by search for Downloaded file size doesn't match expected Content Length: Hi Rahul, Please follow these steps: 1) In your hive source directory run 'ant clean'. 2) remove the contents of ~/.ant/cache/hadoop/core/sources 3) Download

Re: Problem installing Hive on hadoop 0.20.1

2009-11-07 Thread Zheng Shao
Hi Massoud, Once you did ant package, you will need to go into build/dist and then run bin/hive Zheng On Fri, Nov 6, 2009 at 11:27 AM, Ning Zhang nzh...@facebook.com wrote: Sorry there was a typo in my previous email: please replace testcaes to testcase in the unit test command. Basically

Re: key part of sequence files

2009-11-05 Thread Zheng Shao
Hi Bobby, Can you open a jira and attach a patch? We can put that to contrib. Zheng On 11/5/09, Bobby Rullo bo...@metaweb.com wrote: Andrey, Here you go: http://pastebin.com/m5724ce8a Bobby On Nov 5, 2009, at 8:59 AM, Andrey Pankov wrote: Thanks Bobby. Yeah, could be nice to take a

Re: Problem regarding external table

2009-11-03 Thread Zheng Shao
Hi Mohan, Most probably there are some exceptions in the process. Can you take a look at /tmp/user/hive.log ? Also, for SELECT count(1) Hive should generate a mapreduce job. Did you see that map-reduce job running? Did the map task and reduce task run smoothly? Can you take a look at their

Re: Problem regarding Hive Command Line Interface

2009-11-03 Thread Zheng Shao
Yes, but you would need to set up hive mysql metastore. It's on Hive wiki I believe. Zheng On Tue, Nov 3, 2009 at 11:10 PM, Mohan Agarwal mohan.agarwa...@gmail.comwrote: Hi, Can I run multiple Hive CLI from different systems ponting over common hadoop ? Thanking You Mohan Agarwal

Re: Creating and populating bucketed tables

2009-10-28 Thread Zheng Shao
) CLUSTERED BY(userid) INTO 256 BUCKETS; Is it possible to specify more than one key in the CLUSTERED BY(...) clause? Also, if I am clustering my tables, where/when would I expect to get improved performance in Hive queries? Thanks, Ryan On Sat, Oct 24, 2009 at 6:56 PM, Zheng Shao zsh

Re: Hive set up

2009-10-27 Thread Zheng Shao
Hi Rahul, I think you are treating the svn directory as HIVE_HOME. If you do ant package, HIVE_HOME should set to build/dist. Zheng On Tue, Oct 27, 2009 at 1:19 AM, Rahul Pal rahul@one97.net wrote: I copied the files (*hadoop-0.19.0.tar.gz and hadoop-0.20.0.tar.gz*) to *

Re: Issues with joining across large tables

2009-10-26 Thread Zheng Shao
It's probably caused by the Cartesian product of many rows from the two tables with the same key. Zheng On Sun, Oct 25, 2009 at 7:22 PM, Ryan LeCompte lecom...@gmail.com wrote: It also looks like the reducers just never stop outputting things likethe (following -- see below), causing them to

Re: join in hive

2009-10-25 Thread Zheng Shao
Mostly correct. 2. Your idea looks interesting but I would say in reality, the percentage of tuples purged may not be that large. 4. Hive does NOT treat the partition column differently than others. 5. There is no sort-merge join yet. This would be a great feature to add onto Hive! Zheng

Re: ERROR DataNucleus.Plugin problem with hive

2009-10-24 Thread Zheng Shao
We also saw that message. Prasad, do you have any idea on that? 2009-10-22 10:50:48,475 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle org.eclipse.jdt.core requires org.eclipse.core.resources but it cann ot be resolved. 2009-10-22 10:50:48,475 ERROR DataNucleus.Plugin

Re: Hive query questions: referring to aliased column in group by and comments

2009-10-23 Thread Zheng Shao
1) Hive does not support that yet. If you don't want to repeat the expression, there is one work-around with sub query: SELECT myalias, count(1) FROM (SELECT if(col='x', 1, 0) as myalias) tmp GROUP BY myalias 2) Yes. Comment lines begin with -- . It's the standard SQL comment format. Zheng On

Re: Key for table stored as sequence file

2009-10-23 Thread Zheng Shao
Hi Vinay, If you are adding a partition of data, you will need to run ALTER TABLE xxx ADD PARTITION(...). You can also try https://issues.apache.org/jira/browse/HIVE-142 The key part does not matter - both BytesWritable/LongWritable should work. Hive ignores the data in key. Zheng On Thu, Oct

Re: I can't use hive client while using hive server

2009-10-19 Thread Zheng Shao
See http://wiki.apache.org/hadoop/Hive/GettingStarted#Metadata_Store 2009/10/18 Clark Yang (杨卓荦) clarkyzl-h...@yahoo.com.cn When I use $hive --service hiveserver as a service for JDBC client. Then I use $hive But any use of HiveQL on hive client will not be available? Why does

Re: hive-0.4.0 build

2009-10-19 Thread Zheng Shao
Hi Schubert, I am not an expert on ivy, but is it possible to change the md5 check logic in ivy? We might want to support both .md5 file format. If you are interested in going this route, please open a JIRA and we can work together on that. Also, for the time being, you might want to just skip

Re: Are udf classes treated as singletons?

2009-10-15 Thread Zheng Shao
We use a single instance of the udf for each node in the expression tree. For example, in a+b, + will be called by all the rows of the table, and we have a single instance of +. However, in a+b+c, there will be 2 instances of +. in a lot of the udfs, we already do such initializations. Take a

Re: Hive vs. DryadLINQ

2009-10-15 Thread Zheng Shao
Hi Qing, Talking about high-level design and architecture, I think the ideas proposed in Hive will help SQL - DryadLINQ translation as well. Hive internally translates the SQL query into a DAG plan which should fit Dryad - but with the limitation of Hadoop, we have to cut the DAG plan into

Re: Why still can not I use Hive?

2009-10-15 Thread Zheng Shao
The error message from Hive.log: Caused by: org.datanucleus.exceptions.NucleusException: Plugin (Bundle) org.eclipse.jdt.core is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL

Re: Why still can not I use Hive?

2009-10-15 Thread Zheng Shao
/etc/init.d to make sure that the Hadoop would be started automatically. Is there anything wrong? How can I fix it? Thank you. -- *发件人:* Zheng Shao zsh...@gmail.com *收件人:* hive-user@hadoop.apache.org *已发送:* 周五, 10 16, 2009 12:20:48 下午 *主 题:* Re: Why still can not I

Re: Current Hive Optimizer

2009-10-13 Thread Zheng Shao
AST are built by Hive.g (Using antlr). AST - OperatorTree is done by SemanticAnalyzer.java. Optimizations are done by Transformer.java and its sub classes. Hope this help you get started. Zheng On Tue, Oct 13, 2009 at 10:01 AM, bharath v bharathvissapragada1...@gmail.com wrote: Thanks for

Re: Problem of queries hanging with DynamicSerDe

2009-10-12 Thread Zheng Shao
Hi Vijay, DynamicSerDe is deprecated. Please use the following SerDe instead: https://issues.apache.org/jira/browse/HIVE-662 Can you point us to where you see this example? We should update it with RegexSerDe. Zheng On Mon, Oct 12, 2009 at 4:46 PM, Vijay tec...@gmail.com wrote: Hi, I have

Re: How can I make hive available for Hadoop 0.20.1

2009-10-12 Thread Zheng Shao
Hi Clark, hive release 0.4.0 is just out. It's compatible with hadoop 0.17 to 0.20 Please download it from https://svn.apache.org/viewvc/hadoop/hive/tags/release-0.4.0/ Zheng On Mon, Oct 12, 2009 at 7:46 PM, 杨卓荦 clarkyzl-h...@yahoo.com.cn wrote: I access to the

Re: Multiple aggregated metrics in one query

2009-10-10 Thread Zheng Shao
) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at java.lang.ClassLoader.loadClass(ClassLoader.java:251) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) ... 7 more Any ideas? Thanks, Ryan On Sat, Oct 10, 2009 at 2:47 PM, Zheng Shao zsh...@gmail.com wrote: Yes, we can do

Re: Hive ignores key when reading sequence files?

2009-10-06 Thread Zheng Shao
Hi Bobby, We just need a special FileInputFormat - The FileInputFormat should be able to read SequenceFile, and then prepend the key to the value before it's returned to the Hive framework. Then in Hive language, we can say: add jar my.jar; CREATE TABLE mytable (key STRING, value STRING) STORED

Re: Hive Super Highway suggestion HiveRegionServer

2009-10-05 Thread Zheng Shao
: On Sun, Oct 4, 2009 at 7:24 PM, Zheng Shao zsh...@gmail.com wrote: +1 Making input data and query results available in a short delay is definitely a very attractive feature for Hive. There are multiple approaches to achieve this, mainly depending on how much we leverage HBase

Re: JSON Column Type

2009-10-04 Thread Zheng Shao
, 2009, at 1:28 AM, Zheng Shao wrote: I got it. You mean TimeSpentQuerying, PageType, TotalRevenue and UserAgent are all UDFs that takes a JSON object and outputs a STRING. Exactly! Allowing such a new object type just for UDFs are simpler than supporting a new type in all parts of the system

Re: Hive Super Highway suggestion HiveRegionServer

2009-10-04 Thread Zheng Shao
+1 Making input data and query results available in a short delay is definitely a very attractive feature for Hive. There are multiple approaches to achieve this, mainly depending on how much we leverage HBase. The simplest way to go is to probably have a good Hive/HBase integration like

Re: JSON Column Type

2009-10-03 Thread Zheng Shao
this: CREATE TABLE txn_logs (tid String, txn JSON); Thanks, Bobby On Oct 2, 2009, at 10:54 PM, Zheng Shao wrote: We have 2 example serdes, one for text data (regexserde), one for binary data (thriftserde). But the simplest solution for this is to add a udf get_json_objects that returns

Re: Language settings within Hive and HDFS

2009-09-26 Thread Zheng Shao
Hi Tom, Currently Hive/Hadoop recognizes data as UTF-8. If your encoding is different, most likely you can still process the data using Hive without any problems, as long as Hive/Hadoop does not have to do UTF-8 decoding. What is the row format of your data? Fields separated by TAB or

Re: hive can't built by hadoop-0.20.0

2009-09-20 Thread Zheng Shao
Hi, You might not have Internet access to download the required tgz file for hive. You can do ant clean and remove ~/.ivy2 and try again. Also, now you don't need to specify -Dhadoop.version=... when doing ant package any more. Hive should directly work with hadoop 0.20 without any problems.

Re: saving table into local file as textfile

2009-09-20 Thread Zheng Shao
.crc files are checksum files by hadoop. They can be ignored. The first 2 files (non-crc) ARE text files - just replace Ctrl-A (^A, ascii code 1) with TAB, then you get what you want. Zheng On Thu, Sep 17, 2009 at 5:52 AM, Avishay Livne avish...@il.ibm.com wrote: Hi, I am trying to save a

Re: hive can't built by hadoop-0.20.0

2009-09-20 Thread Zheng Shao
right? 2009/9/20 Zheng Shao zsh...@gmail.com Hi, You might not have Internet access to download the required tgz file for hive. You can do ant clean and remove ~/.ivy2 and try again. Also, now you don't need to specify -Dhadoop.version=... when doing ant package any more. Hive should

Re: Strange behavior during Hive queries

2009-09-16 Thread Zheng Shao
You mean 14 mappers running concurrently, correct? How many mappers in total for the hive query? Zheng On Wed, Sep 16, 2009 at 6:50 AM, Brad Heintz brad.hei...@gmail.com wrote: There are 14 mappers spawned when I do a Hive query - over 7 nodes. Other jobs spawn 7 nodes per mapper (total of

Re: Custom serde for parsing

2009-09-10 Thread Zheng Shao
On Thu, Sep 10, 2009 at 9:05 PM, Mayuran Yogarajah mayuran.yogara...@casalemedia.com wrote: Zheng Shao wrote: 1. Yes the performance will be affected, especially we are doing one regex match per row, as well as creating a lot of String objects. If we define them as int and uses the default row

Re: Class not found

2009-09-10 Thread Zheng Shao
CAST(CAST(col AS DOUBLE) AS BIGINT) Zheng On Thu, Sep 10, 2009 at 10:10 PM, Mayuran Yogarajah mayuran.yogara...@casalemedia.com wrote: Zheng Shao wrote: You need to run add jar before running the SELECT command. Don't copy it to $hadoop_dir/lib - That will make upgrading Hive much harder

Re: FAILED: Unknown exception Everywhere

2009-09-08 Thread Zheng Shao
Hi, please try to open /tmp/username/hive.log and find the last exception. Zheng On Sun, Sep 6, 2009 at 12:07 PM, Ryan Rosario uclamath...@gmail.com wrote: Hi, I am trying to learn Hive, but am having problems as I am getting a lot of error messages that do not hint at how to resolve

Re: streaming debug

2009-09-07 Thread Zheng Shao
We can do a try except in the loop that process data line by line, to print out the stack trace in python, as well as the line of data that caused the problem. Zheng On Sun, Sep 6, 2009 at 8:46 PM, Min Zhou coderp...@gmail.com wrote: Hi all, Currently, debugging a streaming written in other

Re: column based storage in hive

2009-09-04 Thread Zheng Shao
You can use Hive's TRANSFORM clause to do hadoop-streaming-like custom map-reduce jobs. You won't be restricted in any sense. Zheng On Fri, Sep 4, 2009 at 4:48 PM, Abhijit Pol a...@rocketfuelinc.com wrote: Hive supports column based storage using RC files.

Re: Array index out of bounds exception ...

2009-08-31 Thread Zheng Shao
Yes, see https://issues.apache.org/jira/browse/HIVE-719 Zheng On Sun, Aug 30, 2009 at 9:03 PM, Eva Tse e...@netflix.com wrote: We run the following query: create table test_map (other_properties mapstring, string); insert overwrite table test_map select other_properties from log_table

Re: Running ant tests in contrib directory

2009-08-26 Thread Zheng Shao
ant -Dtestcase=TestContribCliDriver -Dqfile=dboutput.q test Zheng On Wed, Aug 26, 2009 at 8:10 AM, Edward Capriolo edlinuxg...@gmail.comwrote: I am unable to running a specifc ant test on one q file in contrib... contrib]$ ant -Dtestcase=TestCliDriver -Dqfile=dboutput.q test This does not

Re: UDF Function Error

2009-08-25 Thread Zheng Shao
(URLDecoder.java:173) at com.sharethis.norm.URLDecode.evaluate(URLDecode.java:25) ... 15 more On Thu, Aug 20, 2009 at 10:48 PM, Zheng Shao zsh...@gmail.com wrote: http://svn.apache.org/viewvc/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java?revision=806034 Can you

Re: Output compression not working on hive-trunk (r802989)

2009-08-25 Thread Zheng Shao
-- http://pastebin.com/m59d5a84b I don't see compression anywhere. Saurabh. On Fri, Aug 21, 2009 at 11:30 AM, Zheng Shao zsh...@gmail.com wrote: Hi Suarabh, Sorry for the delay on this. We are busy with the production this week. I don't think there is much difference in CLI queries and JDBC

Re: How to decrease the number of Mappers (not reducers) ?

2009-08-25 Thread Zheng Shao
I guess you have a lot of small files in the table. Can you merge those small files into bigger files? Zheng On Tue, Aug 25, 2009 at 1:08 PM, Ravi Jagannathan ravi.jagannat...@nominum.com wrote: There are too many mappers in Hive. Table has approximately 50K rows, number of bytes =

Re: Output compression not working on hive-trunk (r802989)

2009-08-21 Thread Zheng Shao
difference in how CLI queries and JDBC queries are treated? Saurabh. On Tue, Aug 18, 2009 at 11:19 AM, Zheng Shao zsh...@gmail.com wrote: Hi Saurabh, So the compression flag is correct when the plan is generated. When you run the query, you should see plan = xxx.xml in the log file. Can you open

Re: File create perm. denied on starting hive

2009-08-20 Thread Zheng Shao
Can you check the mapred.tmp.dir in your hadoop-site.xml/hadoop-default.xml? It seems that Hadoop is unable to write to that directory. Zheng On Thu, Aug 20, 2009 at 12:30 PM, vinay guptavingup2...@yahoo.com wrote: Hello hive users, I am getting the following exception while running bin/hive

Re: Output compression not working on hive-trunk (r802989)

2009-08-17 Thread Zheng Shao
The default log level is WARN. Please change it to INFO. hive.root.logger=INFO,DRFA Of course you can also use LOG.warn() in your test code. Zheng On Sun, Aug 16, 2009 at 11:58 PM, Saurabh Nandasaurabhna...@gmail.com wrote: I still can't find the log output anywhere. The log file is in

Re: Output compression not working on hive-trunk (r802989)

2009-08-17 Thread Zheng Shao
, Zheng Shao zsh...@gmail.com wrote: The default log level is WARN. Please change it to INFO. hive.root.logger=INFO,DRFA Of course you can also use LOG.warn() in your test code. Zheng -- http://nandz.blogspot.com http://foodieforlife.blogspot.com -- http://nandz.blogspot.com http

Re: Output compression not working on hive-trunk (r802989)

2009-08-14 Thread Zheng Shao
Great. We are one step closer to the root cause. Can you print out a log line here as well? This is the place that we fill in the compression option. SemanticAnalyzer.java:2711: Operator output = putOpInsertMap( OperatorFactory.getAndMakeChild( new fileSinkDesc(queryTmpdir,

Re: Output compression not working on hive-trunk (r802989)

2009-08-14 Thread Zheng Shao
(HiveConf.ConfVars.COMPRESSRESULT)); Saurabh. On Fri, Aug 14, 2009 at 12:39 PM, Zheng Shao zsh...@gmail.com wrote: Great. We are one step closer to the root cause. Can you print out a log line here as well? This is the place that we fill in the compression option. SemanticAnalyzer.java:2711:    Operator

Re: Output compression not working on hive-trunk (r802989)

2009-08-13 Thread Zheng Shao
Hi Saurabh, hive.exec.compress.output=true is the correct option. Can you post the insert command that you run which produced non-compressed results? Is the output in TextFileFormat or SequenceFileFormat? Zheng On Wed, Aug 12, 2009 at 10:52 PM, Saurabh Nandasaurabhna...@gmail.com wrote: I've

Re: Some questions on hive SELECT/UNION - how to do multiple counts in one query?

2009-08-13 Thread Zheng Shao
SELECT day, SUM(IF(request like '%foo%', 1, 0)), SUM(IF(request like '%bar%', 1, 0)) FROM accesslogs group by day order by day; On Thu, Aug 13, 2009 at 4:42 PM, Vijaytec...@gmail.com wrote: Hi, I have some questions about using SELECT with UNION. I have a number of access log files that

Re: Building hive from trunk

2009-08-11 Thread Zheng Shao
Hi Saurabh, The summary is that we want to make it possible to compile Hive with one version of Hadoop, and let that compiled package work on any versions of Hadoop. Please see the following jiras for details: https://issues.apache.org/jira/browse/HIVE-487

Re: Can I filter rows in SerDe ?

2009-08-06 Thread Zheng Shao
This is not supported right now. Opened HIVE-733 for that https://issues.apache.org/jira/browse/HIVE-640 Zheng On Thu, Aug 6, 2009 at 12:12 AM, Andraz Toriand...@zemanta.com wrote: Is there a possibility of throwing an exception when parsing a specific line, without causing the whole task to

Re: Errors pushing to output S3

2009-08-06 Thread Zheng Shao
Hi Neal, It seems like that the exception is thrown at: org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException Is there any parameter to adjust the s3 file system? Zheng On Thu, Aug 6, 2009 at 7:32 AM, Neal Richternrich...@gmail.com wrote: Any response on this?  We had

Re: How Can I debug Hive in local enviroment ?

2009-08-03 Thread Zheng Shao
Hi Jianfeng, That is not currently supported, because Hive starts a new JVM to run the local map-reduce job in local mode. If you are interested in more details on how to modify it, you can take a look at the TaskFactory.java:98-102. We just need to always use ExecDriver (instead of MapRedTask)

Re: Load data inpath w/o overwrite does not move the files

2009-08-03 Thread Zheng Shao
I reproduced the bug and opened HIVE-718. Zheng On Fri, Jul 31, 2009 at 6:01 PM, Eva Tsee...@netflix.com wrote: We discovered a problem where loading into a new partition in hive w/o specifying ‘overwrite’ doesn’t work. If the specified partition does not exist yet, running the following

Re: Multiple input files for a single map task?

2009-07-29 Thread Zheng Shao
Hi Andraz, It's not supported right now. This will be supported when Hive moves to hadoop 0.20 (with multi-file inputformat). An alternative is to merge these smaller files via a query like this: set hive.exec.reducers.bytes.per.reducer=10; INSERT OVERWRITE t SELECT * FROM t DISTRIBUTE

Re: Re: bz2 Splits.

2009-07-28 Thread Zheng Shao
Yes we do compress all tables. Zheng On Mon, Jul 27, 2009 at 11:08 PM, Saurabh Nandasaurabhna...@gmail.com wrote: In our setup, we didn't change io.seqfile.compress.blocksize (1MB) and it's still fairly good. You are free to try 100MB for better compression ratio, but I would recommend to

Re: partitions not being created

2009-07-28 Thread Zheng Shao
Can you send the output of these 2 commands? describe extended ApiUsage; describe extended ApiUsageTemp; Zheng On Tue, Jul 28, 2009 at 6:29 PM, Bill Grahambillgra...@gmail.com wrote: Thanks for the tip, but it fails in the same way when I use a string. On Tue, Jul 28, 2009 at 6:21 PM, David

Re: I'm having to do LOAD DATA LOCAL INPATH two times to add data

2009-07-27 Thread Zheng Shao
Hi Vijay, What version of Hive are you using? Can you attach /tmp/your_unix_user/hive.log so we can see what might be happening? Zheng On Mon, Jul 27, 2009 at 1:24 PM, Vijaytec...@gmail.com wrote: Hi, I'm pretty new to hadoop/hive. I have everything running pretty good on a single server. I

Re: counting different regexes in a single pass

2009-07-27 Thread Zheng Shao
Hi Andraz, I just opened a JIRA for AWS S3 log format. Can you attach a patch file to: https://issues.apache.org/jira/browse/HIVE-693 ? For your question, I think the approach suggested by David Lerman should work fine. Thanks, Zheng On Mon, Jul 27, 2009 at 11:35 AM, Andraz

Re: Re: bz2 Splits.

2009-07-27 Thread Zheng Shao
I cannot imagine there is such a huge compression ratio difference. On our side, the compression ratio of gzip and GzipCodec (BLOCK) are within 10% relative difference. Log file compression ratio is usually 5x to 15x, so 250MB looks like a good one. The 1600MB number looks like record-level

Re: Re: bz2 Splits.

2009-07-27 Thread Zheng Shao
Hi Saurabh, The right configuration parameter is: set mapred.output.compression.type=BLOCK; Sorry about pointing you to the wrong configuration parameter. Zheng On Mon, Jul 27, 2009 at 10:02 PM, Saurabh Nandasaurabhna...@gmail.com wrote: The 1600MB number looks like record-level compression.

Re: Re: bz2 Splits.

2009-07-27 Thread Zheng Shao
In our setup, we didn't change io.seqfile.compress.blocksize (1MB) and it's still fairly good. You are free to try 100MB for better compression ratio, but I would recommend to keep the default setting to minimize the possibilities of hitting unknown bugs. Zheng On Mon, Jul 27, 2009 at 10:38 PM,

Re: Re: bz2 Splits.

2009-07-25 Thread Zheng Shao
Hi Saurabh, If you want to load data (in compressed/uncompressed text format) into a table, you have to defined the table as stored as textfile instead of stored as sequencefile. Can you try again and let us know? Zheng On Sat, Jul 25, 2009 at 3:05 AM, Saurabh Nandasaurabhna...@gmail.com

Re: Re: bz2 Splits.

2009-07-25 Thread Zheng Shao
Both TextFile and SequenceFile can be compressed or uncompressed. TextFile means the plain text file (records delimited by \n). Compressed TextFiles are just text files compressed by gzip or bzip2 utility. SequenceFile is a special file format that only Hadoop can understand. Since your files

Re: Importing log files in custom (non-delimited) format

2009-07-25 Thread Zheng Shao
If there is any reason that the apache log format cannot be changed (for example, Hive is not the only consumer) You might want to try to use the RegexSerDe that is added to Hive several days ago: http://issues.apache.org/jira/browse/HIVE-662 Zheng On Sat, Jul 25, 2009 at 6:37 AM, Saurabh

Re: how to write a SerDe

2009-07-24 Thread Zheng Shao
Sorry about the delay on this. Here are several example SerDes that got added to the code base recently: RegexSerDe: A SerDe for parsing text using regex (and an example for parsing Apache Log using a regex) https://issues.apache.org/jira/browse/HIVE-167

Re: loading data from HDFS or local file to

2009-07-22 Thread Zheng Shao
If the huge file is already on HDFS (load data WITHOUT local), Hive will just *move* the file into the table (NOTE: that means user won't be able to see the file in its original directory afterwards) If you don't want that to happen, you might want to use CREATE EXTERNAL TABLE LOCATION

Re: Importing log files in custom (non-delimited) format

2009-07-22 Thread Zheng Shao
Hi Saurabh, Sorry for the late reply. You can create a table using this: https://issues.apache.org/jira/browse/HIVE-637 And then use the newly added UDF: https://issues.apache.org/jira/browse/HIVE-642 to read in the data. In this way, you won't need to write any Java code. Let us know if you

Re: Newbie Question - Error reading example files

2009-07-22 Thread Zheng Shao
HI Ray, This error usually happens if Hive.g is updated but ant clean is not run before ant package. Can you try ant clean and rebuild the code? Zheng On Mon, Jul 20, 2009 at 7:57 AM, Ray Duongray.du...@gmail.com wrote: Hi Tim, I'm still getting an error, when specifying the column names.

Re: bz2 Splits.

2009-07-21 Thread Zheng Shao
There are some work along this direction in the hadoop land, but it's not committed yet: https://issues.apache.org/jira/browse/HADOOP-4012 For the short term, we won't be able to split bzip files. If your bzip files are generated outside of hadoop, please split the files before doing compression

Re: insert into not supported?

2009-07-21 Thread Zheng Shao
Yes that's exactly what we do here. See my reply in an earlier email. Zheng On Tue, Jul 21, 2009 at 10:38 PM, Saurabh Nandasaurabhna...@gmail.com wrote: Currently, hive only supports overwriting the data – appending is not supported Never ran into this problem till now, but will soon begin

Re: Creating a UDF (was Hive SerDe?)

2009-07-16 Thread Zheng Shao
HI Saurabh, Hive supports both UDF and GenericUDF. UDF are much easier to write, but it is currently limited to work with primitive types (including String). GenericUDF supports advanced features including complex type parameters/return values, short-circuit computation, complete object reuse

Re: Classpath question with testcase and UDF

2009-07-16 Thread Zheng Shao
Hi Edward, We currently don't allow UDF/GenericUDF to output multiple rows with a single call to evaluate(...). Is that feature going to block you? Also, if you expect an argument to be a String, we should do: 1. Check the type of that argument in the intialize by doing: if (!(arguments[i]

Re: how can I locate the error when test failed?

2009-07-15 Thread Zheng Shao
Hi Min, Try remove build/ql/tmp/hive.log and rerun the test by specifying the -Dtestcase=xxx. If it's TestCliDriver, you can specify -Dqfile=yyy.q for running only that qfile, and -Dtest.silent=false will produce more error messages for map-reduce jobs. Zheng On Tue, Jul 14, 2009 at 11:47 PM,

Re: Hive SerDe?

2009-07-14 Thread Zheng Shao
https://issues.apache.org/jira/browse/HIVE Click create new issue On Mon, Jul 13, 2009 at 11:50 PM, Saurabh Nandasaurabhna...@gmail.com wrote: This command does not take quotations for some historical reasons. We will fix it in the future. Where do I add a bug for this? Saurabh. --

Re: DATETIME or TIMESTAMP data type?

2009-07-13 Thread Zheng Shao
Hi Saurabh, Hive does not have a native date/time data type. In all the cases that we have seen, storing timestamp as BIGINT or STRING is good enough for our users' applications. There is a set of UDFs for date/time stored as bigint/string:

Re: how to define my metastore since jpox be removed

2009-07-12 Thread Zheng Shao
Hi Min, Can you try ant clean and rebuild the project? It's possible that some files are still referencing to the old jpox jars. Zheng On Sun, Jul 12, 2009 at 7:46 PM, Min Zhoucoderp...@gmail.com wrote: I've replaced my hive-default.xml to new one, came across the same exception. The

Re: simple join failing with ClassCastException

2009-07-10 Thread Zheng Shao
Hi David, Thanks for letting us know. I will take a look now. In the meanwhile, there is a fix related to types https://issues.apache.org/jira/browse/HIVE-624 which might solve the problem. You might want to try it out. Zheng On Fri, Jul 10, 2009 at 11:28 AM, David Lermandler...@videoegg.com

Re: simple join failing with ClassCastException

2009-07-10 Thread Zheng Shao
(TaskTracker.java:2198) Dave On 7/10/09 3:17 PM, Zheng Shao zsh...@gmail.com wrote: Hi David, Thanks for letting us know. I will take a look now. In the meanwhile, there is a fix related to types https://issues.apache.org/jira/browse/HIVE-624 which might solve the problem. You might want

Re: simple join failing with ClassCastException

2009-07-10 Thread Zheng Shao
operations.    [javac] Note: Recompile with -Xlint:unchecked for details.    [javac] 2 errors On 7/10/09 4:42 PM, Zheng Shao zsh...@gmail.com wrote: Hi David, Please do ant clean before running ant -Dhadoop.version=0.18.x package Zheng On Fri, Jul 10, 2009 at 12:45 PM, David Lermandler

Re: Is [NOT] NULL operators issues

2009-07-10 Thread Zheng Shao
Hi Eva, Can you give us two lines of the data so that we can debug? Also, what does select count(1) from tablename return? Zheng On Thu, Jul 9, 2009 at 5:38 PM, Eva Tsee...@netflix.com wrote: When we load the output generated by the reducer to hive, we run into some issues with ‘is NULL’

Re: unicode supporting in hive

2009-07-08 Thread Zheng Shao
However, UTF-8 is hard-coded in a lot of places in Hive (actually, also hadoop, see Text.java) If you want to use a different encoding like GBK, we will probably need to extract that UTF-8 out from all the code. Zheng On Wed, Jul 8, 2009 at 12:03 AM, Zheng Shaozsh...@gmail.com wrote: Hi Min,

Re: Can't start hive after setting HIVE_AUX_JARS_PATH ...

2009-07-08 Thread Zheng Shao
It seems hadoop 0.20 RunJar.java does not like the -libjars option. Can you try remove the -libjars xxx from the command line? It's added somewhere in bin/hive (or scripts that got called by bin/hive) Zheng On Wed, Jul 8, 2009 at 3:45 PM, Eva Tsee...@netflix.com wrote: We set the env variable

Re: Issue with nested types

2009-07-07 Thread Zheng Shao
Hi Rakesh, Your analysis is correct overall. The specification of delimiters in DDL statement (create table ...) is invented when we only allow a single level of list or map. If there are multiple levels, these delimiter specifications won't work as you expect. For now, please do the following

Re: How to store list of maps?

2009-07-02 Thread Zheng Shao
Hi Rakesh, Internally Hive does support list of maps, and even more nested levels. However, we didn't open it up for the create table command yet. That shouldn't be hard to do though. We will require the list to be separated by ^B, and map items to be separated by ^C, while map key and value

<    1   2   3   >