On 21 August 2014 00:47, Tony Arcieri <[email protected]> wrote: > On Wed, Aug 20, 2014 at 7:54 AM, Tom Ritter <[email protected]> wrote: >> >> I have strong doubts about accelerometer-based audio pickup in >> real-world settings. > > > https://crypto.stanford.edu/gyrophone/files/gyromic.pdf
Yup, that's what I was talking about. Specifically things like: * As the sampling rate of the gyroscope is limited, one cannot fully reconstruct a comprehensible speech from measurements of a single gyroscope. Therefore, we re- sort to automatic speech recognition. * We extract fea- tures from the gyroscope measurements using various signal processing methods and train machine learning al- gorithms for recognition. We achieve about 50% success rate for speaker identification from a set of 10 speakers. * Our setup consisted of a set of loudspeakers that included a sub-woofer and two tweeters (depicted in Figure 5). The sub-woofer was particularly important for experi- menting with low-frequency tones below 200 Hz. The playback was done at volume of approximately 75 dB to obtain as high SNR as possible for our experiments. This means that for more restrictive attack scenarios (farther source, lower volume) there will be a need to handle low SNR, perhaps by filtering out the noise or applying some other preprocessing for emphasizing the speech signal. * Due to the low sampling frequency of the gyro, a recog- nition of speaker-independent general speech would be an ambitious long-term task. * Therefore, in this work we set out to recognize speech of a limited dictionary, the recognition of which would still leak substantial private information. For this work we chose to focus on the digits dictionary, which includes the words: zero, one, two..., nine, and "oh". Recognition of such words would enable an attacker to eavesdrop on private information, such as credit card numbers, telephone numbers, social security numbers and the like. This information may be eavesdropped when the victim speaks over or next to the phone. * This is a subset of a corpus published in [33]. It includes speech of isolated digits, i.e., 11 words per speaker where each speaker recorded each word twice. There are 10 speakers (5 female and 5 male). In total, there are 10 x 11 x 2 = 220 recordings. The cor- pus is digitized at 20 kHz. When doing this type of research, there's things that can be solved with more work, and there's things that turn out to just be limitations, beyond which you're really interpreting data in unfounded ways. (Like 'Zoom and Enhance' on digital images.) Their work is great, and I think it's important to push the limits of what can be done with it to understand the attack surface - just like we're doing with RF retroreflectors. But I have my doubts that this would could be extended so far as to enable really good audio pickup from a phone in someone's pocket with ambient noises around. Similarly, if someone pointed at this research and said "My laptop has a accelerometer for turning the hard drive off! I removed the microphone, but am I still vulnerable via this specific vector!" - I would seriously doubt it. -tom _______________________________________________ Messaging mailing list [email protected] https://moderncrypto.org/mailman/listinfo/messaging
