[
https://issues.apache.org/jira/browse/HIVEMALL-233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756606#comment-16756606
]
ASF GitHub Bot commented on HIVEMALL-233:
-----------------------------------------
myui commented on pull request #181: [HIVEMALL-233-2] RandomForest regressor
accepts sparse vector input
URL: https://github.com/apache/incubator-hivemall/pull/181
## What changes were proposed in this pull request?
Enable RandomForestRegressor to accept sparse vector input as
RandomForestClassifier already does.
This closes #178
## What type of PR is it?
Improvement
## What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-233
## How was this patch tested?
manual tests on EMR
## How to use this feature?
```sql
with customers as (
select 1 as id, "male" as gender, 23 as age, "Japan" as country, 12 as
num_purchases
union all
select 2 as id, "female" as gender, 43 as age, "US" as country, 4 as
num_purchases
union all
select 3 as id, "other" as gender, 19 as age, "UK" as country, 2 as
num_purchases
union all
select 4 as id, "male" as gender, 31 as age, "US" as country, 20 as
num_purchases
union all
select 5 as id, "female" as gender, 37 as age, "Australia" as country, 9
as num_purchases
),
training as (
select
array_concat(
quantitative_features(
array("age"),
age
),
categorical_features(
array("country", "gender"),
country, gender
)
) as features,
num_purchases
from
customers
)
select
train_randomforest_regressor(
feature_hashing(features), -- feature vector
num_purchases, -- target value
'-trees 40 -seed 31' -- hyper-parameters
)
from
training
;
```
## Checklist
- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for
your commit?
- [ ] Did you run system tests on Hive (or Spark)?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> RandomForest regressor accepts sparse vector input
> --------------------------------------------------
>
> Key: HIVEMALL-233
> URL: https://issues.apache.org/jira/browse/HIVEMALL-233
> Project: Hivemall
> Issue Type: Improvement
> Reporter: Takuya Kitazawa
> Assignee: Takuya Kitazawa
> Priority: Major
>
> While HIVEMALL-75 has enabled RandomForestClassifier to accept sparse vector
> as an input, some crucial code in the classifier is not properly implemented
> in its regressor counterpart; input feature vector is processed differently
> by regressor and classifier.
> This ticket follows up to HIVEMALL-75 so that the regressor behaves similarly
> to the classifier.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)