Github user helenahm commented on the issue:
https://github.com/apache/incubator-hivemall/pull/93
"Yeah, so if you have a lot of training data then running out of memory is
one symptom you run into, but that is not the actual problem of this
implementation."
- it was the big problem for me to use on Hadoop and that is why i had to
alter the training code
- the newer version of the code is as bad as the old one from this point of
view
"The actual cause is that it won't scale beyond one machine."
- yes, that is why I really like what Hivemall project is about, and that
is why i needed MaxEnt for Hive
"In case you manage to make this run with much more data the time it will
take to run will be uncomfortably high."
-- that is why i have tested my new implementation on almost 100 mils of
training samples and saw each of 302 mappers finish work in very reasonable
time
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---