Hi, I am new to Hadoop , need some clarifications a) how to automate executing Map/Reduce jobs and also automating loading data in Hive, do I need to create a cron job or is there a better way.
b) I have 2 tables as the source for M/R jobs 1) Order Master and Order detail OrderMaster has order header columns (OrderId,CustId,PaymentMethod,DeliveryMethod etc) OrderDetail has orders' item level information (viz. OrderId,ItemId,Quantity,SalesPrice,CostPrice,DeliveryAddress, Delivery State,DeliveryZip,DeliveryCountry) The relation between Master and Detail is 1 to many and OrderId is the key. If I generate a tab delimited file from each table, how does Reduce is going to aggregate the data from OrderDetail example If I have to sum the OrderRevenue by Order. Thanks Deepak
