tkonolige commented on code in PR #11479:
URL: https://github.com/apache/tvm/pull/11479#discussion_r887304060


##########
python/tvm/autotvm/tuner/xgboost_cost_model.py:
##########
@@ -243,18 +243,27 @@ def fit_log(self, records, plan_size, 
min_seed_records=500):
         else:
             raise RuntimeError("Invalid feature type: " + self.fea_type)
         result = pool.map_with_error_catching(feature_extract_func, data)
+        result = list(result) # store results so we can iterate through them 
twice
 
-        # filter out feature with different shapes
-        fea_len = len(self._get_feature([0])[0])
+        # get maximum feature length
+        fea_len = -1
+        for res in result:
+            if res.status != StatusKind.COMPLETE:
+                continue
+            x, _ = res.value
+            fea_len = max(fea_len, x.shape[0])
 
         xs, ys = [], []
         for res in result:
             if res.status != StatusKind.COMPLETE:
                 continue
             x, y = res.value
-            if len(x) == fea_len:

Review Comment:
   Well, the training of the model only occurs on data that the model has 
already predicted performance for. The prediction code would then error out 
before we hit this training code which would gracefully fail by dropping data. 
I'm not sure how major of a change you would consider this then...
   
   Usually the xgboost model keeps all previous features in its training set 
(sometime it drops the oldest ones). With this PR, the feature size is 
determined by the longest feature in the training set. So if there's a new 
round with a longer feature, then the feature length with grow.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to