Rohit Jain created TRAFODION-2714:
-------------------------------------
Summary: odb does not load data correctly
Key: TRAFODION-2714
URL: https://issues.apache.org/jira/browse/TRAFODION-2714
Project: Apache Trafodion
Issue Type: Bug
Components: db-utility-odb
Affects Versions: 2.3-incubating
Environment: Internal Esgyn system nap043 and Windows laptop
Reporter: Rohit Jain
I was trying to load a trafodion table via odb but I had rearranged the
sequence of the columns to be loaded. I used a map file to load the data. I
tried to load just 10 rows and verified that the source and target data were in
fact the same, before loading the entire table. To my surprise odb had loaded
all columns correctly but two. It had set the values of one of those two
columns to zero and had set the value of the first column to what should have
been the value in the other column.
When I have the columns in the table arranged in the same sequence as those in
the csv file, and don't have a mapping table, since it is not needed in that
case, everything loads correctly.
On nap043 in /disk1/rohit you will find movie_metadata.csv.
The table I loaded the data into is:
invoke movies_denorm;
..
-- Definition of Trafodion table TRAFODION.MOVIES.MOVIES_DENORM
-- Definition current Tue Aug 15 13:15:13 2017
(
SYSKEY LARGEINT NO DEFAULT NOT NULL NOT DROPPABLE
NOT SERIALIZED
, MOVIE_TITLE CHAR(100) CHARACTER SET ISO88591 COLLATE
DEFAULT DEFAULT NULL NOT SERIALIZED
, TITLE_YEAR SMALLINT DEFAULT NULL NOT SERIALIZED
, CONTENT_RATING CHAR(10) CHARACTER SET ISO88591 COLLATE
DEFAULT DEFAULT NULL NOT SERIALIZED
, IMDB_SCORE NUMERIC(3, 1) DEFAULT NULL NOT SERIALIZED
, NUM_CRITIC_FOR_REVIEWS SMALLINT DEFAULT NULL NOT SERIALIZED
, NUM_USER_FOR_REVIEWS SMALLINT DEFAULT NULL NOT SERIALIZED
, NUM_VOTED_USERS INT DEFAULT NULL NOT SERIALIZED
, MOVIE_FACEBOOK_LIKES INT DEFAULT NULL NOT SERIALIZED
, CAST_TOTAL_FACEBOOK_LIKES INT DEFAULT NULL NOT SERIALIZED
, DURATION SMALLINT DEFAULT NULL NOT SERIALIZED
, BUDGET LARGEINT DEFAULT NULL NOT SERIALIZED
, GROSS INT DEFAULT NULL NOT SERIALIZED
, COLOR CHAR(16) CHARACTER SET ISO88591 COLLATE
DEFAULT DEFAULT NULL NOT SERIALIZED
, ASPECT_RATIO NUMERIC(4, 2) DEFAULT NULL NOT SERIALIZED
, COUNTRY CHAR(20) CHARACTER SET ISO88591 COLLATE
DEFAULT DEFAULT NULL NOT SERIALIZED
, LANGUAGE CHAR(10) CHARACTER SET ISO88591 COLLATE
DEFAULT DEFAULT NULL NOT SERIALIZED
, FACENUMBER_IN_POSTER SMALLINT DEFAULT NULL NOT SERIALIZED
, GENRES CHAR(100) CHARACTER SET ISO88591 COLLATE
DEFAULT DEFAULT NULL NOT SERIALIZED
, PLOT_KEYWORDS CHAR(150) CHARACTER SET ISO88591 COLLATE
DEFAULT DEFAULT NULL NOT SERIALIZED
, MOVIE_IMDB_LINK CHAR(100) CHARACTER SET ISO88591 COLLATE
DEFAULT DEFAULT NULL NOT SERIALIZED
, ACTOR_1_NAME CHAR(35) CHARACTER SET ISO88591 COLLATE
DEFAULT DEFAULT NULL NOT SERIALIZED
, ACTOR_1_FACEBOOK_LIKES INT DEFAULT NULL NOT SERIALIZED
, ACTOR_2_NAME CHAR(35) CHARACTER SET ISO88591 COLLATE
DEFAULT DEFAULT NULL NOT SERIALIZED
, ACTOR_2_FACEBOOK_LIKES INT DEFAULT NULL NOT SERIALIZED
, ACTOR_3_NAME CHAR(35) CHARACTER SET ISO88591 COLLATE
DEFAULT DEFAULT NULL NOT SERIALIZED
, ACTOR_3_FACEBOOK_LIKES INT DEFAULT NULL NOT SERIALIZED
, DIRECTOR_NAME CHAR(35) CHARACTER SET ISO88591 COLLATE
DEFAULT DEFAULT NULL NOT SERIALIZED
, DIRECTOR_FACEBOOK_LIKES INT DEFAULT NULL NOT SERIALIZED
)
I used the latest version of odb from my Windows laptop using the latest
version of the JDBC driver. Here is the load command:
odb -u trafodion -p traf123 -d esgyndb -l
src=movie_metadata.csv:tgt=TRAFODION.MOVIES.MOVIES_ORIG:skip=1:loadcmd=UL:truncate:map=movies_odb_column_map.txt
mapping:
COLOR:0
DIRECTOR_NAME:1
NUM_CRITIC_FOR_REVIEWS:2
DURATION:3
DIRECTOR_FACEBOOK_LIKES:4
ACTOR_3_FACEBOOK_LIKES:5
ACTOR_2_NAME:6
ACTOR_1_FACEBOOK_LIKES:7
GROSS:8
GENRES:9
ACTOR_1_NAME:10
MOVIE_TITLE:11
NUM_VOTED_USERS:12
CAST_TOTAL_FACEBOOK_LIKES:13
ACTOR_3_NAME:14
FACENUMBER_IN_POSTER:15
PLOT_KEYWORDS:16
MOVIE_IMDB_LINK:17
NUM_USER_FOR_REVIEWS:18
LANGUAGE:19
COUNTRY:20
CONTENT_RATING:21
BUDGET:22
TITLE_YEAR:23
ACTOR_2_FACEBOOK_LIKES:24
IMDB_SCORE:25
ASPECT_RATIO:26
MOVIE_FACEBOOK_LIKES:27
In the csv file you can see that the 3rd column in the first two rows is
num_critic_for_reviews and has the values 723 and 302. odb uses a column
offset starting from zero, which for most folks like me is very confusing.
This column after the load has zeros in all the 10 sample rows I loaded.
Also, the last entry in the map list is movie_facebook_likes. This column has
the values 33,000 and 0 in the csv column. In the table this column has the
values 723 and 302.
So, the values that should have been in the column num_critics_for_review are
in movie_facebook_likes, and the num_critics_for_review has the value zero in
all rows. All other columns have the correct values in them.
color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,actor_1_name,movie_title,num_voted_users,cast_total_facebook_likes,actor_3_name,facenumber_in_poster,plot_keywords,movie_imdb_link,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
Color,James Cameron,723,178,0,855,Joel David
Moore,1000,760505847,Action|Adventure|Fantasy|Sci-Fi,CCH Pounder,AvatarĀ
,886204,4834,Wes
Studi,0,avatar|future|marine|native|paraplegic,http://www.imdb.com/title/tt0499549/?ref_=fn_tt_tt_1,3054,English,USA,PG-13,237000000,2009,936,7.9,1.78,33000
Color,Gore Verbinski,302,169,563,1000,Orlando
Bloom,40000,309404152,Action|Adventure|Fantasy,Johnny Depp,Pirates of the
Caribbean: At World's EndĀ ,471220,48350,Jack Davenport,0,goddess|marriage
ceremony|marriage
proposal|pirate|singapore,http://www.imdb.com/title/tt0449088/?ref_=fn_tt_tt_1,1238,English,USA,PG-13,300000000,2007,5000,7.1,2.35,0
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)