[
https://issues.apache.org/jira/browse/BIGTOP-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062987#comment-14062987
]
jay vyas commented on BIGTOP-1272:
----------------------------------
Finished reading th patch, and looks like its all there.
But, after running the mahout integration test, I get an
{{InvalidInputException}}. I suspect this is related to the formatting of the
pig stage, but not sure:
{noformat}
Command line arguments: {--alpha=[0.8], --endPhase=[2147483647],
--implicitFeedback=[false], --input=[bps_integration_/cleaned/Mahout],
--lambda=[0.1], --numFeatures=[2], --numIterations=[5],
--numThreadsPerSolver=[1], --output=[bps_integration_/Mahout/AlsFactorization],
--startPhase=[0], --tempDir=[/tmp/mahout_1405475399824]}
Command line arguments: {--alpha=[0.8], --endPhase=[2147483647],
--implicitFeedback=[false], --input=[bps_integration_/cleaned/Mahout],
--lambda=[0.1], --numFeatures=[2], --numIterations=[5],
--numThreadsPerSolver=[1], --output=[bps_integration_/Mahout/AlsFactorization],
--startPhase=[0], --tempDir=[/tmp/mahout_1405475399824]}
mapred.input.dir is deprecated. Instead, use
mapreduce.input.fileinputformat.inputdir
mapred.input.dir is deprecated. Instead, use
mapreduce.input.fileinputformat.inputdir
mapred.compress.map.output is deprecated. Instead, use
mapreduce.map.output.compress
mapred.compress.map.output is deprecated. Instead, use
mapreduce.map.output.compress
mapred.output.dir is deprecated. Instead, use
mapreduce.output.fileoutputformat.outputdir
mapred.output.dir is deprecated. Instead, use
mapreduce.output.fileoutputformat.outputdir
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= -
already initialized
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= -
already initialized
Cleaning up the staging area
file:/tmp/hadoop-bigpetstore/mapred/staging/bigpetstore1132639609/.staging/job_local1132639609_0002
Cleaning up the staging area
file:/tmp/hadoop-bigpetstore/mapred/staging/bigpetstore1132639609/.staging/job_local1132639609_0002
org.apache.bigtop.bigpetstore.BigPetStoreMahoutIT > testPetStorePipeline FAILED
org.apache.hadoop.mapreduce.lib.input.InvalidInputException at
BigPetStoreMahoutIT.java:69
1 test completed, 1 failed
:integrationTest FAILED
{noformat}
Will dive some more.
> BigPetStore: Productionize the Mahout recommender
> -------------------------------------------------
>
> Key: BIGTOP-1272
> URL: https://issues.apache.org/jira/browse/BIGTOP-1272
> Project: Bigtop
> Issue Type: New Feature
> Components: Blueprints
> Affects Versions: backlog
> Reporter: jay vyas
> Attachments: BIGTOP-1272.patch, BIGTOP-1272.patch, BIGTOP-1272.patch,
> arch.jpeg
>
>
> BIGTOP-1271 adds patterns into the data that gaurantee that a meaningfull
> type of product recommendation can be given for at least *some* customers,
> since we know that there are going to be many customers who only bought 1
> product, and also customers that bought 2 or more products -- even in a
> dataset size of 10. due to the gaussian distribution of purchases that is
> also in the dataset generator.
> The current mahout recommender code is statically valid: It runs to
> completion in local unit tests if a hadoop 1x tarball is present but
> otherwise it hasn't been tested at scale. So, lets get it working. this
> JIRA also will comprise:
> - deciding wether to use mahout 2x for unit tests (default on mahout maven
> repo is the 1x impl) and wether or not bigtop should host a mahout 2x jar?
> After all, bigtop builds a mahout 2x jar as part of its packaging process,
> and BigPetStore might thus need a mahout 2x jar in order to test against the
> right same of bigtop releases.
--
This message was sent by Atlassian JIRA
(v6.2#6252)