Daniel Papp created HIVE-17487:
----------------------------------

             Summary: Example fails on the Hive Getting started page
                 Key: HIVE-17487
                 URL: https://issues.apache.org/jira/browse/HIVE-17487
             Project: Hive
          Issue Type: Bug
            Reporter: Daniel Papp
            Priority: Trivial


There is an example on [Hive Getting 
Started|https://cwiki.apache.org/confluence/display/Hive/GettingStarted] page 
using the MovieLens100k dataset. The mapper is defined as a python script in 
the following way:

{code}
import sys
import datetime

for line in sys.stdin:
  line = line.strip()
  userid, movieid, rating, unixtime = line.split('\t')
  weekday = datetime.datetime.fromtimestamp(float(unixtime)).isoweekday()
  print '\t'.join([userid, movieid, rating, str(weekday)])
{code}

which is correct assuming you're using the python 2 series. The following code 
works with both 2 and 3 series:

{code}
from __future__ import print_function
import sys
import datetime

for line in sys.stdin:
  line = line.strip()
  userid, movieid, rating, unixtime = line.split('\t')
  weekday = datetime.datetime.fromtimestamp(float(unixtime)).isoweekday()
  print('\t'.join([userid, movieid, rating, str(weekday)]))
{code}

I think this should be corrected.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to