wrote:
Hi Zheng,
Is there any way to convince the LazySimpleSerde to allow leading/trailing
spaces in non-text fields?
-Todd
On Mon, Jan 4, 2010 at 7:33 PM, Zheng Shao zsh...@gmail.com wrote:
Hi Eric,
Most probably there are leading/trailing spaces in the columns that
are defined
REPLACE COLUMNS.
Zheng
On Mon, Jan 4, 2010 at 10:28 PM, Eric Sammer e...@lifeless.net wrote:
On 1/4/10 10:33 PM, Zheng Shao wrote:
Hi Eric,
Most probably there are leading/trailing spaces in the columns that
are defined as int.
If Hive cannot parse the field successfully, the field
How was the Chinese words encoded in the file? Is it UTF-8 or GB?
If it's GB, then Hive will have difficulty in converting them to unicode.
Please take a look at Driver.java. There is a method to get the
results as ListString.
If we get the result as Text (byte array) instead, you can get the
Can you open /tmp/user/hive.log? It should have the full stack trace.
Zheng
On Wed, Dec 23, 2009 at 12:17 PM, Nathan Rasch
nathan.ra...@returnpath.net wrote:
All:
I've been setting up Hive using Derby in Server Mode as per the instructions
here:
Mohan,
Please take a look at /tmp/username/hive.log. It contains the full
stack trace of the problem.
It seems like a configuration problem.
Zheng
On Sun, Dec 20, 2009 at 9:04 PM, Mohan Agarwal
mohan.agarwa...@gmail.com wrote:
Hi,
I have installed hadoop-0.19.2 on my system in a
You are correct.
Just opened https://issues.apache.org/jira/browse/HIVE-1002
This is a highly wanted feature from a lot of users.
Please comment on the JIRA. Let's figure out how we want to do it.
Zheng
On Wed, Dec 16, 2009 at 6:36 PM, ken.barc...@wellsfargo.com wrote:
Correct me if I’m
Sorry about the delay.
Are you using Hive trunk?
Filed https://issues.apache.org/jira/browse/HIVE-1001
We should use (new Path(str)).getPath() instead of chopping off the
first 5 chars.
Zheng
On Mon, Dec 14, 2009 at 4:43 PM, David Lerman dler...@videoegg.com wrote:
I'm running into errors
We plan to run a vote and branch 0.5 around early Jan.
However we do run trunk for some adhoc queries (note that trunk and
branch 0.4 can share the metastore and data on hdfs) and branch 0.4
for production queries.
Hive trunk does support combine file input format but the fix to
hadoop 0.20 was
Try this:
set mapred.map.tasks=28;
Zheng
From: Ryan LeCompte [mailto:lecom...@gmail.com]
Sent: Thursday, December 10, 2009 1:45 PM
To: hive-user@hadoop.apache.org
Subject: Hive not using the full mapper capacity for certain jobs
Hello all,
The cluster has a capacity of 28 concurrent mappers. It
are the list of changes:
HIVE-884. Metastore Server should call System.exit() on error.
(Zheng Shao via pchakka)
HIVE-864. Fix map-join memory-leak.
(Namit Jain via zshao)
HIVE-878. Update the hash table entry before flushing in Group By
hash aggregation (Zheng Shao via namit)
HIVE-882
Yes Ken. Please try http://www.fileformat.info/tool/regex.htm to test
your regex to see if it can match your data or not.
Zheng
On Mon, Nov 30, 2009 at 6:11 PM, ken.barc...@wellsfargo.com wrote:
I looked in hive.log while doing the CREATE TABLE and found:
2009-11-30 17:51:53,240 WARN
:
Hey Zheng,
What do we need to do to fix this? It seems to have bitten a number of
people by now.
-Todd
On Tue, Nov 10, 2009 at 3:50 PM, Zheng Shao zsh...@gmail.com wrote:
I am forwarding an earlier email from the same mailing list by search
for Downloaded file size doesn't match expected
Yes you can compile it into a jar, and insert the command line java
/xxx/my.jar into Hive queries.
http://www.slideshare.net/ragho/hive-user-meeting-august-2009-facebook
Page 72 has an example
IF your map/reduce function is simple, you can probably write a Hive
UDF instead.
In the near future,
Hi Hive users,
Would you please add your company's name and a little description of
how you used Hive on the following wiki page?
This helps new users get more ideas about how Hive can be used and
where it is used now.
http://wiki.apache.org/hadoop/Hive/PoweredBy
--
Yours,
Zheng
I am forwarding an earlier email from the same mailing list by search
for Downloaded file size doesn't match expected Content Length:
Hi Rahul,
Please follow these steps:
1) In your hive source directory run 'ant clean'.
2) remove the contents of ~/.ant/cache/hadoop/core/sources
3) Download
Hi Massoud,
Once you did ant package, you will need to go into build/dist and then
run bin/hive
Zheng
On Fri, Nov 6, 2009 at 11:27 AM, Ning Zhang nzh...@facebook.com wrote:
Sorry there was a typo in my previous email: please replace testcaes
to testcase in the unit test command. Basically
Hi Bobby,
Can you open a jira and attach a patch?
We can put that to contrib.
Zheng
On 11/5/09, Bobby Rullo bo...@metaweb.com wrote:
Andrey,
Here you go:
http://pastebin.com/m5724ce8a
Bobby
On Nov 5, 2009, at 8:59 AM, Andrey Pankov wrote:
Thanks Bobby. Yeah, could be nice to take a
Hi Mohan,
Most probably there are some exceptions in the process.
Can you take a look at /tmp/user/hive.log ?
Also, for SELECT count(1) Hive should generate a mapreduce job.
Did you see that map-reduce job running? Did the map task and reduce task
run smoothly? Can you take a look at their
Yes, but you would need to set up hive mysql metastore. It's on Hive wiki I
believe.
Zheng
On Tue, Nov 3, 2009 at 11:10 PM, Mohan Agarwal mohan.agarwa...@gmail.comwrote:
Hi,
Can I run multiple Hive CLI from different systems ponting over common
hadoop ?
Thanking You
Mohan Agarwal
)
CLUSTERED BY(userid) INTO 256 BUCKETS;
Is it possible to specify more than one key in the CLUSTERED BY(...)
clause?
Also, if I am clustering my tables, where/when would I expect to get
improved performance in Hive queries?
Thanks,
Ryan
On Sat, Oct 24, 2009 at 6:56 PM, Zheng Shao zsh
Hi Rahul,
I think you are treating the svn directory as HIVE_HOME. If you do ant
package, HIVE_HOME should set to build/dist.
Zheng
On Tue, Oct 27, 2009 at 1:19 AM, Rahul Pal rahul@one97.net wrote:
I copied the files (*hadoop-0.19.0.tar.gz and hadoop-0.20.0.tar.gz*) to *
It's probably caused by the Cartesian product of many rows from the two
tables with the same key.
Zheng
On Sun, Oct 25, 2009 at 7:22 PM, Ryan LeCompte lecom...@gmail.com wrote:
It also looks like the reducers just never stop outputting things likethe
(following -- see below), causing them to
Mostly correct.
2. Your idea looks interesting but I would say in reality, the percentage of
tuples purged may not be that large.
4. Hive does NOT treat the partition column differently than others.
5. There is no sort-merge join yet. This would be a great feature to add
onto Hive!
Zheng
We also saw that message. Prasad, do you have any idea on that?
2009-10-22 10:50:48,475 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) -
Bundle org.eclipse.jdt.core requires org.eclipse.core.resources but it
cann
ot be resolved.
2009-10-22 10:50:48,475 ERROR DataNucleus.Plugin
1) Hive does not support that yet. If you don't want to repeat the
expression, there is one work-around with sub query:
SELECT myalias, count(1)
FROM (SELECT if(col='x', 1, 0) as myalias) tmp
GROUP BY myalias
2) Yes. Comment lines begin with -- . It's the standard SQL comment
format.
Zheng
On
Hi Vinay,
If you are adding a partition of data, you will need to run ALTER TABLE xxx
ADD PARTITION(...).
You can also try https://issues.apache.org/jira/browse/HIVE-142
The key part does not matter - both BytesWritable/LongWritable should work.
Hive ignores the data in key.
Zheng
On Thu, Oct
See http://wiki.apache.org/hadoop/Hive/GettingStarted#Metadata_Store
2009/10/18 Clark Yang (杨卓荦) clarkyzl-h...@yahoo.com.cn
When I use
$hive --service hiveserver
as a service for JDBC client.
Then I use
$hive
But any use of HiveQL on hive client will not be available?
Why does
Hi Schubert,
I am not an expert on ivy, but is it possible to change the md5 check logic
in ivy?
We might want to support both .md5 file format. If you are interested in
going this route, please open a JIRA and we can work together on that.
Also, for the time being, you might want to just skip
We use a single instance of the udf for each node in the expression tree.
For example, in a+b, + will be called by all the rows of the table,
and we have a single instance of +.
However, in a+b+c, there will be 2 instances of +.
in a lot of the udfs, we already do such initializations. Take a
Hi Qing,
Talking about high-level design and architecture, I think the ideas proposed
in Hive will help SQL - DryadLINQ translation as well.
Hive internally translates the SQL query into a DAG plan which should fit
Dryad - but with the limitation of Hadoop, we have to cut the DAG plan into
The error message from Hive.log:
Caused by: org.datanucleus.exceptions.NucleusException: Plugin (Bundle)
org.eclipse.jdt.core is already registered. Ensure you dont have multiple
JAR versions of the same plugin in the classpath. The URL
/etc/init.d to make sure that
the Hadoop would be started automatically.
Is there anything wrong? How can I fix it? Thank you.
--
*发件人:* Zheng Shao zsh...@gmail.com
*收件人:* hive-user@hadoop.apache.org
*已发送:* 周五, 10 16, 2009 12:20:48 下午
*主 题:* Re: Why still can not I
AST are built by Hive.g (Using antlr).
AST - OperatorTree is done by SemanticAnalyzer.java.
Optimizations are done by Transformer.java and its sub classes.
Hope this help you get started.
Zheng
On Tue, Oct 13, 2009 at 10:01 AM, bharath v
bharathvissapragada1...@gmail.com wrote:
Thanks for
Hi Vijay,
DynamicSerDe is deprecated.
Please use the following SerDe instead:
https://issues.apache.org/jira/browse/HIVE-662
Can you point us to where you see this example? We should update it with
RegexSerDe.
Zheng
On Mon, Oct 12, 2009 at 4:46 PM, Vijay tec...@gmail.com wrote:
Hi,
I have
Hi Clark,
hive release 0.4.0 is just out. It's compatible with hadoop 0.17 to 0.20
Please download it from
https://svn.apache.org/viewvc/hadoop/hive/tags/release-0.4.0/
Zheng
On Mon, Oct 12, 2009 at 7:46 PM, 杨卓荦 clarkyzl-h...@yahoo.com.cn wrote:
I access to the
)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
... 7 more
Any ideas?
Thanks,
Ryan
On Sat, Oct 10, 2009 at 2:47 PM, Zheng Shao zsh...@gmail.com wrote:
Yes, we can do
Hi Bobby,
We just need a special FileInputFormat - The FileInputFormat should be able
to read SequenceFile, and then prepend the key to the value before it's
returned to the Hive framework.
Then in Hive language, we can say:
add jar my.jar;
CREATE TABLE mytable (key STRING, value STRING)
STORED
:
On Sun, Oct 4, 2009 at 7:24 PM, Zheng Shao zsh...@gmail.com wrote:
+1
Making input data and query results available in a short delay is
definitely
a very attractive feature for Hive.
There are multiple approaches to achieve this, mainly depending on how
much
we leverage HBase
, 2009, at 1:28 AM, Zheng Shao wrote:
I got it. You mean TimeSpentQuerying, PageType, TotalRevenue and UserAgent
are all UDFs that takes a JSON object and outputs a STRING.
Exactly!
Allowing such a new object type just for UDFs are simpler than supporting a
new type in all parts of the system
+1
Making input data and query results available in a short delay is definitely
a very attractive feature for Hive.
There are multiple approaches to achieve this, mainly depending on how much
we leverage HBase.
The simplest way to go is to probably have a good Hive/HBase integration
like
this:
CREATE TABLE txn_logs (tid String, txn JSON);
Thanks,
Bobby
On Oct 2, 2009, at 10:54 PM, Zheng Shao wrote:
We have 2 example serdes, one for text data (regexserde), one for
binary data (thriftserde).
But the simplest solution for this is to add a udf get_json_objects
that returns
Hi Tom,
Currently Hive/Hadoop recognizes data as UTF-8.
If your encoding is different, most likely you can still process the data
using Hive without any problems, as long as Hive/Hadoop does not have to do
UTF-8 decoding.
What is the row format of your data? Fields separated by TAB or
Hi,
You might not have Internet access to download the required tgz file for
hive.
You can do ant clean and remove ~/.ivy2 and try again.
Also, now you don't need to specify -Dhadoop.version=... when doing ant
package any more. Hive should directly work with hadoop 0.20 without any
problems.
.crc files are checksum files by hadoop. They can be ignored.
The first 2 files (non-crc) ARE text files - just replace Ctrl-A (^A, ascii
code 1) with TAB, then you get what you want.
Zheng
On Thu, Sep 17, 2009 at 5:52 AM, Avishay Livne avish...@il.ibm.com wrote:
Hi,
I am trying to save a
right?
2009/9/20 Zheng Shao zsh...@gmail.com
Hi,
You might not have Internet access to download the required tgz file for
hive.
You can do ant clean and remove ~/.ivy2 and try again.
Also, now you don't need to specify -Dhadoop.version=... when doing ant
package any more. Hive should
You mean 14 mappers running concurrently, correct?
How many mappers in total for the hive query?
Zheng
On Wed, Sep 16, 2009 at 6:50 AM, Brad Heintz brad.hei...@gmail.com wrote:
There are 14 mappers spawned when I do a Hive query - over 7 nodes. Other
jobs spawn 7 nodes per mapper (total of
On Thu, Sep 10, 2009 at 9:05 PM, Mayuran Yogarajah
mayuran.yogara...@casalemedia.com wrote:
Zheng Shao wrote:
1. Yes the performance will be affected, especially we are doing one regex
match per row, as well as creating a lot of String objects. If we define
them as int and uses the default row
CAST(CAST(col AS DOUBLE) AS BIGINT)
Zheng
On Thu, Sep 10, 2009 at 10:10 PM, Mayuran Yogarajah
mayuran.yogara...@casalemedia.com wrote:
Zheng Shao wrote:
You need to run add jar before running the SELECT command.
Don't copy it to $hadoop_dir/lib - That will make upgrading Hive much
harder
Hi, please try to open /tmp/username/hive.log and find the last exception.
Zheng
On Sun, Sep 6, 2009 at 12:07 PM, Ryan Rosario uclamath...@gmail.com wrote:
Hi,
I am trying to learn Hive, but am having problems as I am getting a
lot of error messages that do not hint at how to resolve
We can do a try except in the loop that process data line by line, to
print out the stack trace in python, as well as the line of data that caused
the problem.
Zheng
On Sun, Sep 6, 2009 at 8:46 PM, Min Zhou coderp...@gmail.com wrote:
Hi all,
Currently, debugging a streaming written in other
You can use Hive's TRANSFORM clause to do hadoop-streaming-like custom
map-reduce jobs.
You won't be restricted in any sense.
Zheng
On Fri, Sep 4, 2009 at 4:48 PM, Abhijit Pol a...@rocketfuelinc.com wrote:
Hive supports column based storage using RC files.
Yes, see https://issues.apache.org/jira/browse/HIVE-719
Zheng
On Sun, Aug 30, 2009 at 9:03 PM, Eva Tse e...@netflix.com wrote:
We run the following query:
create table test_map (other_properties mapstring, string);
insert overwrite table test_map
select other_properties from log_table
ant -Dtestcase=TestContribCliDriver -Dqfile=dboutput.q test
Zheng
On Wed, Aug 26, 2009 at 8:10 AM, Edward Capriolo edlinuxg...@gmail.comwrote:
I am unable to running a specifc ant test on one q file in contrib...
contrib]$ ant -Dtestcase=TestCliDriver -Dqfile=dboutput.q test
This does not
(URLDecoder.java:173)
at com.sharethis.norm.URLDecode.evaluate(URLDecode.java:25)
... 15 more
On Thu, Aug 20, 2009 at 10:48 PM, Zheng Shao zsh...@gmail.com wrote:
http://svn.apache.org/viewvc/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java?revision=806034
Can you
--
http://pastebin.com/m59d5a84b
I don't see compression anywhere.
Saurabh.
On Fri, Aug 21, 2009 at 11:30 AM, Zheng Shao zsh...@gmail.com wrote:
Hi Suarabh,
Sorry for the delay on this. We are busy with the production this week.
I don't think there is much difference in CLI queries and JDBC
I guess you have a lot of small files in the table.
Can you merge those small files into bigger files?
Zheng
On Tue, Aug 25, 2009 at 1:08 PM, Ravi Jagannathan
ravi.jagannat...@nominum.com wrote:
There are too many mappers in Hive. Table has approximately 50K rows,
number of bytes =
difference in how CLI queries and JDBC queries are treated?
Saurabh.
On Tue, Aug 18, 2009 at 11:19 AM, Zheng Shao zsh...@gmail.com wrote:
Hi Saurabh,
So the compression flag is correct when the plan is generated.
When you run the query, you should see plan = xxx.xml in the log
file. Can you open
Can you check the mapred.tmp.dir in your hadoop-site.xml/hadoop-default.xml?
It seems that Hadoop is unable to write to that directory.
Zheng
On Thu, Aug 20, 2009 at 12:30 PM, vinay guptavingup2...@yahoo.com wrote:
Hello hive users,
I am getting the following exception while running bin/hive
The default log level is WARN. Please change it to INFO.
hive.root.logger=INFO,DRFA
Of course you can also use LOG.warn() in your test code.
Zheng
On Sun, Aug 16, 2009 at 11:58 PM, Saurabh Nandasaurabhna...@gmail.com wrote:
I still can't find the log output anywhere.
The log file is in
, Zheng Shao zsh...@gmail.com wrote:
The default log level is WARN. Please change it to INFO.
hive.root.logger=INFO,DRFA
Of course you can also use LOG.warn() in your test code.
Zheng
--
http://nandz.blogspot.com
http://foodieforlife.blogspot.com
--
http://nandz.blogspot.com
http
Great. We are one step closer to the root cause.
Can you print out a log line here as well? This is the place that we
fill in the compression option.
SemanticAnalyzer.java:2711:
Operator output = putOpInsertMap(
OperatorFactory.getAndMakeChild(
new fileSinkDesc(queryTmpdir,
(HiveConf.ConfVars.COMPRESSRESULT));
Saurabh.
On Fri, Aug 14, 2009 at 12:39 PM, Zheng Shao zsh...@gmail.com wrote:
Great. We are one step closer to the root cause.
Can you print out a log line here as well? This is the place that we
fill in the compression option.
SemanticAnalyzer.java:2711:
Operator
Hi Saurabh,
hive.exec.compress.output=true is the correct option. Can you post the
insert command that you run which produced non-compressed results?
Is the output in TextFileFormat or SequenceFileFormat?
Zheng
On Wed, Aug 12, 2009 at 10:52 PM, Saurabh Nandasaurabhna...@gmail.com wrote:
I've
SELECT day,
SUM(IF(request like '%foo%', 1, 0)),
SUM(IF(request like '%bar%', 1, 0))
FROM accesslogs
group by day
order by day;
On Thu, Aug 13, 2009 at 4:42 PM, Vijaytec...@gmail.com wrote:
Hi,
I have some questions about using SELECT with UNION. I have a number of
access log files that
Hi Saurabh,
The summary is that we want to make it possible to compile Hive with
one version of Hadoop, and let that compiled package work on any
versions of Hadoop.
Please see the following jiras for details:
https://issues.apache.org/jira/browse/HIVE-487
This is not supported right now. Opened HIVE-733 for that
https://issues.apache.org/jira/browse/HIVE-640
Zheng
On Thu, Aug 6, 2009 at 12:12 AM, Andraz Toriand...@zemanta.com wrote:
Is there a possibility of throwing an exception when parsing a specific
line, without causing the whole task to
Hi Neal,
It seems like that the exception is thrown at:
org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException
Is there any parameter to adjust the s3 file system?
Zheng
On Thu, Aug 6, 2009 at 7:32 AM, Neal Richternrich...@gmail.com wrote:
Any response on this? We had
Hi Jianfeng,
That is not currently supported, because Hive starts a new JVM to run
the local map-reduce job in local mode.
If you are interested in more details on how to modify it, you can
take a look at the TaskFactory.java:98-102. We just need to always use
ExecDriver (instead of MapRedTask)
I reproduced the bug and opened HIVE-718.
Zheng
On Fri, Jul 31, 2009 at 6:01 PM, Eva Tsee...@netflix.com wrote:
We discovered a problem where loading into a new partition in hive w/o
specifying ‘overwrite’ doesn’t work.
If the specified partition does not exist yet, running the following
Hi Andraz,
It's not supported right now.
This will be supported when Hive moves to hadoop 0.20 (with multi-file
inputformat).
An alternative is to merge these smaller files via a query like this:
set hive.exec.reducers.bytes.per.reducer=10;
INSERT OVERWRITE t
SELECT * FROM t
DISTRIBUTE
Yes we do compress all tables.
Zheng
On Mon, Jul 27, 2009 at 11:08 PM, Saurabh Nandasaurabhna...@gmail.com wrote:
In our setup, we didn't change io.seqfile.compress.blocksize (1MB) and
it's still fairly good.
You are free to try 100MB for better compression ratio, but I would
recommend to
Can you send the output of these 2 commands?
describe extended ApiUsage;
describe extended ApiUsageTemp;
Zheng
On Tue, Jul 28, 2009 at 6:29 PM, Bill Grahambillgra...@gmail.com wrote:
Thanks for the tip, but it fails in the same way when I use a string.
On Tue, Jul 28, 2009 at 6:21 PM, David
Hi Vijay,
What version of Hive are you using?
Can you attach /tmp/your_unix_user/hive.log so we can see what might
be happening?
Zheng
On Mon, Jul 27, 2009 at 1:24 PM, Vijaytec...@gmail.com wrote:
Hi,
I'm pretty new to hadoop/hive. I have everything running pretty good on a
single server. I
Hi Andraz,
I just opened a JIRA for AWS S3 log format.
Can you attach a patch file to: https://issues.apache.org/jira/browse/HIVE-693 ?
For your question, I think the approach suggested by David Lerman
should work fine.
Thanks,
Zheng
On Mon, Jul 27, 2009 at 11:35 AM, Andraz
I cannot imagine there is such a huge compression ratio difference. On
our side, the compression ratio of gzip and GzipCodec (BLOCK) are
within 10% relative difference.
Log file compression ratio is usually 5x to 15x, so 250MB looks like a good one.
The 1600MB number looks like record-level
Hi Saurabh,
The right configuration parameter is:
set mapred.output.compression.type=BLOCK;
Sorry about pointing you to the wrong configuration parameter.
Zheng
On Mon, Jul 27, 2009 at 10:02 PM, Saurabh Nandasaurabhna...@gmail.com wrote:
The 1600MB number looks like record-level compression.
In our setup, we didn't change io.seqfile.compress.blocksize (1MB) and
it's still fairly good.
You are free to try 100MB for better compression ratio, but I would
recommend to keep the default setting to minimize the possibilities of
hitting unknown bugs.
Zheng
On Mon, Jul 27, 2009 at 10:38 PM,
Hi Saurabh,
If you want to load data (in compressed/uncompressed text format) into
a table, you have to defined the table as stored as textfile instead
of stored as sequencefile.
Can you try again and let us know?
Zheng
On Sat, Jul 25, 2009 at 3:05 AM, Saurabh Nandasaurabhna...@gmail.com
Both TextFile and SequenceFile can be compressed or uncompressed.
TextFile means the plain text file (records delimited by \n).
Compressed TextFiles are just text files compressed by gzip or bzip2
utility.
SequenceFile is a special file format that only Hadoop can understand.
Since your files
If there is any reason that the apache log format cannot be changed
(for example, Hive is not the only consumer)
You might want to try to use the RegexSerDe that is added to Hive
several days ago:
http://issues.apache.org/jira/browse/HIVE-662
Zheng
On Sat, Jul 25, 2009 at 6:37 AM, Saurabh
Sorry about the delay on this.
Here are several example SerDes that got added to the code base recently:
RegexSerDe: A SerDe for parsing text using regex (and an example for
parsing Apache Log using a regex)
https://issues.apache.org/jira/browse/HIVE-167
If the huge file is already on HDFS (load data WITHOUT local), Hive
will just *move* the file into the table (NOTE: that means user won't
be able to see the file in its original directory afterwards)
If you don't want that to happen, you might want to use CREATE
EXTERNAL TABLE LOCATION
Hi Saurabh,
Sorry for the late reply.
You can create a table using this:
https://issues.apache.org/jira/browse/HIVE-637
And then use the newly added UDF: https://issues.apache.org/jira/browse/HIVE-642
to read in the data.
In this way, you won't need to write any Java code. Let us know if you
HI Ray,
This error usually happens if Hive.g is updated but ant clean is not
run before ant package.
Can you try ant clean and rebuild the code?
Zheng
On Mon, Jul 20, 2009 at 7:57 AM, Ray Duongray.du...@gmail.com wrote:
Hi Tim,
I'm still getting an error, when specifying the column names.
There are some work along this direction in the hadoop land, but it's
not committed yet:
https://issues.apache.org/jira/browse/HADOOP-4012
For the short term, we won't be able to split bzip files.
If your bzip files are generated outside of hadoop, please split the
files before doing compression
Yes that's exactly what we do here.
See my reply in an earlier email.
Zheng
On Tue, Jul 21, 2009 at 10:38 PM, Saurabh Nandasaurabhna...@gmail.com wrote:
Currently, hive only supports overwriting the data – appending is not
supported
Never ran into this problem till now, but will soon begin
HI Saurabh,
Hive supports both UDF and GenericUDF.
UDF are much easier to write, but it is currently limited to work with
primitive types (including String).
GenericUDF supports advanced features including complex type
parameters/return values, short-circuit computation, complete object
reuse
Hi Edward,
We currently don't allow UDF/GenericUDF to output multiple rows with a
single call to evaluate(...).
Is that feature going to block you?
Also, if you expect an argument to be a String, we should do:
1. Check the type of that argument in the intialize by doing:
if (!(arguments[i]
Hi Min,
Try remove build/ql/tmp/hive.log and rerun the test by specifying the
-Dtestcase=xxx.
If it's TestCliDriver, you can specify -Dqfile=yyy.q for running only
that qfile, and -Dtest.silent=false will produce more error messages
for map-reduce jobs.
Zheng
On Tue, Jul 14, 2009 at 11:47 PM,
https://issues.apache.org/jira/browse/HIVE
Click create new issue
On Mon, Jul 13, 2009 at 11:50 PM, Saurabh Nandasaurabhna...@gmail.com wrote:
This command does not take quotations for some historical reasons. We
will fix it in the future.
Where do I add a bug for this?
Saurabh.
--
Hi Saurabh,
Hive does not have a native date/time data type. In all the cases that
we have seen, storing timestamp as BIGINT or STRING is good enough for
our users' applications.
There is a set of UDFs for date/time stored as bigint/string:
Hi Min,
Can you try ant clean and rebuild the project? It's possible that
some files are still referencing to the old jpox jars.
Zheng
On Sun, Jul 12, 2009 at 7:46 PM, Min Zhoucoderp...@gmail.com wrote:
I've replaced my hive-default.xml to new one, came across the same
exception.
The
Hi David,
Thanks for letting us know. I will take a look now.
In the meanwhile, there is a fix related to types
https://issues.apache.org/jira/browse/HIVE-624 which might solve the
problem.
You might want to try it out.
Zheng
On Fri, Jul 10, 2009 at 11:28 AM, David Lermandler...@videoegg.com
(TaskTracker.java:2198)
Dave
On 7/10/09 3:17 PM, Zheng Shao zsh...@gmail.com wrote:
Hi David,
Thanks for letting us know. I will take a look now.
In the meanwhile, there is a fix related to types
https://issues.apache.org/jira/browse/HIVE-624 which might solve the
problem.
You might want
operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 2 errors
On 7/10/09 4:42 PM, Zheng Shao zsh...@gmail.com wrote:
Hi David,
Please do ant clean before running ant -Dhadoop.version=0.18.x package
Zheng
On Fri, Jul 10, 2009 at 12:45 PM, David Lermandler
Hi Eva,
Can you give us two lines of the data so that we can debug?
Also, what does select count(1) from tablename return?
Zheng
On Thu, Jul 9, 2009 at 5:38 PM, Eva Tsee...@netflix.com wrote:
When we load the output generated by the reducer to hive, we run into some
issues with ‘is NULL’
However, UTF-8 is hard-coded in a lot of places in Hive (actually,
also hadoop, see Text.java)
If you want to use a different encoding like GBK, we will probably
need to extract that UTF-8 out from all the code.
Zheng
On Wed, Jul 8, 2009 at 12:03 AM, Zheng Shaozsh...@gmail.com wrote:
Hi Min,
It seems hadoop 0.20 RunJar.java does not like the -libjars option.
Can you try remove the -libjars xxx from the command line?
It's added somewhere in bin/hive (or scripts that got called by bin/hive)
Zheng
On Wed, Jul 8, 2009 at 3:45 PM, Eva Tsee...@netflix.com wrote:
We set the env variable
Hi Rakesh,
Your analysis is correct overall.
The specification of delimiters in DDL statement (create table ...) is
invented when we only allow a single level of list or map.
If there are multiple levels, these delimiter specifications won't
work as you expect.
For now, please do the following
Hi Rakesh,
Internally Hive does support list of maps, and even more nested levels.
However, we didn't open it up for the create table command yet. That
shouldn't be hard to do though.
We will require the list to be separated by ^B, and map items to be
separated by ^C, while map key and value
101 - 200 of 239 matches
Mail list logo