It's definitely just a typo. The ordered categories are A, C, B so the other split can't be A | B, C. Just open a PR.
On Thu, Aug 7, 2014 at 2:11 AM, Matt Forbes <m...@tellapart.com> wrote: > I found the section on ordering categorical features really interesting, > but the A, B, C example seemed inconsistent. Am I interpreting this passage > wrong, or are there typos? Aren't the split candidates A | C, B and A, C | > B ? > > For example, for a binary classification problem with one categorical > feature with three categories A, B and C with corresponding proportion of > label 1 as 0.2, 0.6 and 0.4, the categorical features are ordered as A > followed by C followed B or A, B, C. The two split candidates are A | C, B > and A , B | C where | denotes the split. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org