Hi Everyone, my apologies is that has been asked/answered someplace else, I 
couldn't find it.

followed gluon CV tutorial on action recognition " fine tuning with your custom 
dataset". I am using the model slowfast_4x16_resnet50_custom, and following 
advice from [this post](https://github.com/dmlc/gluon-cv/issues/1187) I was 
able to make it work. I am using as shown there VideoClsCustom class to load 
the dataset:

`train_dataset = VideoClsCustom(root=YOUR_ROOT_PATH, setting=YOUR_SETTING_FILE, 
train=True, new_length=64, slowfast=True, slow_temporal_stride=16, 
fast_temporal_stride=2, transform=transform_train)`

In that post there is a clarification that said: "Basically this means, we 
randomly select 64 consecutive frames. For fast branch, we use a temporal 
stride of 2 to sample the 64 frames into 32 frames. For slow branch, we use a 
temporal stride of 16 to sample the 64 frames into 4 frames. Then we 
concatenate them together and feed it to the network as input."

The thing is that most of my videos are short duration clips of 1 seconds, with 
30 frames (30FPS), What happens when the dataset have videos that have fewer 
than 64 frames? (and sometimes even fewer than 32 frames)

Is the data loader duplicating some frames? (upsampling?) or is it filling them 
with noise or what?

Thanking you in advance,





---
[Visit 
Topic](https://discuss.mxnet.apache.org/t/videoclscustoms-new-lenght-parameter-when-the-number-of-frames-in-the-videos-is-fewer-than-new-lenght/6949/1)
 or reply to this email to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.mxnet.apache.org/email/unsubscribe/5a592c84a5c33af2b05b6d94b6fa1363003a51bbb35f50cc4716af2f0bd240f6).

Reply via email to