I am taking a free course on Tiny Machine Learning. I was wondering why they 
converted speech (for an extremely simple recognition task) into (a form of) 
imagery. Part of the reason is that by extracting frequencies of the speech 
(with a Fourier Transform) the output could be simplified by using a filter for 
those frequencies that humans are good at detecting.  But I believe that there 
is another reason. Adding the dimension of frequency to the dimension of 
amplitude and time, helps to usefully discretize the input. The CNN then 
compresses the output. The point is that the input data is expanded before it 
is used in the CNN. This makes sense as long as the additional discretization 
(or additional dimension of discretization) contains significantly useful 
information about the relations within the 'string' of data - and those 
relations can be used by an AI application. 
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Td08f1cb9cdd5e5d9-M417cc558c6bceecbffb7fa49
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to