ShootingSpace commented on a change in pull request #10461: allow user to define unknown token symbol URL: https://github.com/apache/incubator-mxnet/pull/10461#discussion_r180169879
########## File path: python/mxnet/rnn/io.py ########## @@ -58,6 +62,8 @@ def encode_sentences(sentences, vocab=None, invalid_label=-1, invalid_key='\n', if vocab is None: vocab = {invalid_key: invalid_label} new_vocab = True + elif unknown_token: + new_vocab = True Review comment: Hi, there is situation where users has their own dictionary, say `dict = {'a':1, 'b':2, 'c':3}` 'abc' are frequent tokens the user care about. All the rest rare tokens are considered as unknown token (say the user define it as `'UNK'`), that return a encoded list `[[1,2,3],[2,3,0]]`, a key-value pair `'UNK': 0` is added into the dictionary. But the previous version will raise error for this case, which by default assuming that user will provdie a thorough vocaburary. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services