I am taking a free course on Tiny Machine Learning. I was wondering why they
converted speech (for an extremely simple recognition task) into (a form of)
imagery. Part of the reason is that by extracting frequencies of the speech
(with a Fourier Transform) the output could be simplified by using a filter for
those frequencies that humans are good at detecting. But I believe that there
is another reason. Adding the dimension of frequency to the dimension of
amplitude and time, helps to usefully discretize the input. The CNN then
compresses the output. The point is that the input data is expanded before it
is used in the CNN. This makes sense as long as the additional discretization
(or additional dimension of discretization) contains significantly useful
information about the relations within the 'string' of data - and those
relations can be used by an AI application.
------------------------------------------
Artificial General Intelligence List: AGI
Permalink:
https://agi.topicbox.com/groups/agi/Td08f1cb9cdd5e5d9-M417cc558c6bceecbffb7fa49
Delivery options: https://agi.topicbox.com/groups/agi/subscription