yifeim commented on issue #15428: Dataloader does not support sparse data URL: https://github.com/apache/incubator-mxnet/issues/15428#issuecomment-508835732 The vanilla sparse format lacks sufficient information for e.g., recommendation applications. There are many extensions on group-wise ranking loss, other field identifiers, and other pipe marks. Here are some examples: 1. Group-wise ranking loss vw allows auxiliary labels and [shared information among groups](https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Contextual-Bandit-algorithms) ``` shared | s_1 s_2 0:1.0:0.5 | a:1 b:1 c:1 | a:0.5 b:2 c:1 ``` xgboost allows a [`.group` file](https://xgboost.readthedocs.io/en/latest/tutorials/input_format.html#group-input-format) to count how many rows belong to one ranking group ``` 2 3 ``` 2. Multi-field features libffm uses [multiple columns](https://github.com/ycjuan/libffm/blob/master/README#L116) ``` <label> <field1>:<feature1>:<value1> <field2>:<feature2>:<value2> ... ``` vw uses [multiple pipes](https://github.com/VowpalWabbit/vowpal_wabbit/wiki/input-format) ``` 1 1.0 |MetricFeatures:3.28 height:1.5 length:2.0 |Says black with white stripes |OtherFeatures NumberOfLegs:4.0 HasStripes 1 1.0 zebra|MetricFeatures:3.28 height:1.5 length:2.0 |Says black with white stripes |OtherFeatures NumberOfLegs:4.0 HasStripes ``` 3. Other delimiters in open-source datasets, e.g., [Criteo counterfactual analysis challenge](https://arxiv.org/abs/1612.00367) is similar to the vw format, but uses space as delimiters. ``` example ${exID}: ${hashID} ${wasAdClicked} ${propensity} ${nbSlots} ${nbCandidates} ${displayFeat1}:${v 1} ... ${wasProduct1Clicked} exid:${exID} ${productFeat1 1}:${v1 1} ... ``` It is rather difficult to enumerate all the cases, so I would recommend allowing more flexibility, e,g, with a regex format for the parser.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
